Some people believe that the multi-dimensional data classification system is sufficient, and labels are a "chicken rib".Some people also believe that the label system is conducive to the extraction and analysis of big data, provides portrait capabilities, and realizes accurate recommendations, which must be the standard configuration of the data middle platform.
If the "labeling system" is indeed controversial, then the function introduced today must not be controversial, it is definitely the real standard configuration of the data middle platform, it is - data service.
What is a data service?
There are a wide range of types of data services, such as data transmission services, data storage services, data analysis services, data security services, and data security services.
These are all called data services, but these data services emphasize capabilities, and the more accurate definition is "data as a service", but this is not the data service of the data center we are going to talk about today!
What exactly is the data service of the data center?
My understanding: different systems interact with each other in the way they use servicesData services provide a "bridge of communication" between data and applications, and this bridge exists in the form of APIs.
Think of it like an electrical outlet, for example, all you need is your hair dryer to have a matching plug and plug it in, and the current will flow to your hair dryer in the same way that data flows to your data application.
Why does the data middle office need data services?
In many articles on the Internet, I like to use the data center as a metaphor for "chefs cooking". There are generally several steps for a chef to cook: buying vegetables, washing and selecting vegetables, formulating a menu, and stir-frying. These steps are called in the data processing process of the data middle platform: data collection, data cleaning, data modeling, data analysis, and data application.
Data collection
Just like the chef cooking, it is difficult for a clever woman to cook without rice, she needs to cook a few good dishes, and she must first have raw materials, so data collection and data access is the process of buying vegetables.
Data cleansing
Data cleaning is the process of picking and washing vegetables, and it is necessary to clean the dirty data.
Data modeling
The dishes are picked and washed, but what dishes to fry need to be made according to the menu ordered by the guests. Data modeling is like creating a menu for a guest, e.g., what dish does the guest like?Fish-flavored shredded pork or kung pao chicken, whether the taste is sweet, spicy or light, etc., should be clearly described and passed on to the "chef". Data modeling is all about translating the needs of data consumers into a language that computers can understand.
Data analysis Data application
According to the customer's "menu" requirements, stir-fry and plate.
Well, seeing that some people here can't help but ask: Having said so much, the data service is in **?Why does the data middle office need data services?
In the process of this "chef cooking", there is a role that cannot be ignored, I don't know if you have noticed, this role is the "waiter". His task is to help customers order food and bring the stir-fried dishes to the guests' tables. The "waiter" of the data middle platform is the data service, English name: oneservice.
Imagine you're sitting at a table in a restaurant with a menu to choose from. But without a waiter, what is missing is the critical link that communicates your menu to the kitchen and returns your food to the table. This is where the "server" (data service) comes in, taking the request of the data consumer and telling the system what to do, and providing the data service to the data consumer in the form of an API.
Also, throughout the process,The data service also has a role, it shields the technical details of the underlying data, and data consumers do not need to care about the question of "which database these data come from, which table, what is the database type, etc.".Just care about "whether this data meets my needs".
Just like when you go to a restaurant to eat, you don't have to worry about where you bought the dish from, who is the side dish, who is the stir-fry, etc., you just need to care about whether the dish is to your taste.
What problems can data services solve?
In traditional data integration solutions, data is often exported from one system, imported, or copied to another. With the continuous expansion of enterprise data application scale, data integration needs to be carried out in dozens or even hundreds of systems, and the traditional data integration method is becoming more and more difficult and exposing more and more problems.
1. Data inconsistencies caused by data "moving".
Traditional data integration requires data to be replicated from one system to another, and data is "lost" in the process due to networks, interfaces, programs, tasks, and other uncertain factors, resulting in data inconsistency.
In most cases, the data delivered through the data API provided by the data middle platform does not require the data to be "landed", emphasizing the right to use rather than the right to own, which greatly reduces the inconsistency caused by the flow of data to the downstream system.
2. The data access is diverse, and the integration efficiency is low
The data middle platform will design different data access and storage solutions according to the type of enterprise data, the size of the data, and the application requirements of the data. For example, you can use MySQL and Oracle to access data with a relatively small amount of data, Greenplum to access data with a large amount of data that requires multi-dimensional analysis, use HBase to access a large amount of keyvalue data, and use Elasticsearch to create data indexes to improve data query efficiency. In this case, it is undoubtedly a very complicated matter to expose the data according to each data access method.
Through the data middle platform, various types of data are encapsulated into a unified data API and an external interface is provided, which can shield the problems of complex data integration and low efficiency caused by the diversity of data access.
3. It is impossible to monitor which applications access the data
In traditional data projects, even if tools such as metadata are used, the full-link lineage analysis of data collection, aggregation, cleaning, processing, and application cannot be realized. In particular, the link from the data platform to the data application is almost all fragmented, and the data platform provides data for the data application by exporting, importing or copying the data, and once the data enters the downstream system, the data platform cannot monitor its usage.
The unified data service API provided by the data middle platform builds a bridge between data applications and the data middle platform. Data APIs can only be accessed through authorizationWhen authorizing data applications and applications accessing data APIs, the data access link can be notified to the metadata center in the form of "tags", thus opening up the link from the data middle platform to the data application and forming the full life cycle lineage of the data.
4. Changes in upstream data affect downstream data applications
In many data projects, it is also common for data applications to directly call the data platform's database to access data. As a result, changes to upstream data can have a significant impact on downstream data applications.
The data middle platform provides a unified data API for data applications to call, realizing the decoupling of the data middle platform and data applications. Establish and map with each data source within the data serviceIf the upstream data changes, you only need to adjust the mapping of the data service, which will not affect the use of the data application.
What should a data service have?
In the data middle platform architecture, the data service layer is located in the upper layer of the data middle platform, which connects the data consumption layer and provides the integrated data to data consumers in the form of services for better performance and experience. The data service layer has the following functions:
Cross-source data services
The diversity of access data in the data middle platform determines that the technical architecture of the data middle platform needs to be composed of multiple big data components, such as: HIVE, HBASE, GP, ES, REDIS, MySQL, Oracle, etc., and the use of data in business may span multiple databases. The cross-source data service provided by the data service layer shields the technical differences of the underlying data sources, extracts data from different data sources, and orchestrates it according to business needs, forming a unified API for external sharing.
Subject Data Services
According to different business topics, the organization forms a unified data API. The data middle platform inherits the theme-oriented idea of data warehouseConsolidate data from the same business subject stored in the middle of different data sources, shield multiple data sources and multiple physical tables, and form a standard data service for external use. For example, the sales theme needs to collect the sales data of the company's wholesale, retail, online, offline, ** and other channels.
One-stop inquiry
The data service finally transforms the API accessed by the user into the underlying access to various data sources, realizes one-stop query of data in the data middle platform, provides data retrieval, online analysis, real-time query, etc., and improves the efficiency of data query.
The whole link is connected
Data services not only provide the ability to connect data and applications, but also write the access status of data APIs to the metadata center in real time through functions such as service authorization and access monitoring, forming a complete data lineage.
Subscription deliverability
After the data API is built, the data consumer does not need to repeatedly build the integration channel, but allows the data consumer to quickly use the data through the interface through the authorization "subscription".
API Gateway Service
API Gateway uses cloud-native technologies to provide unified management and monitoring capabilities for service APIsIncluding: service registration, automatic service discovery, authentication and authorization, flow control, circuit breaker, security control, monitoring and analysis, etc.
How to build data services in the data center?
Granularity issues
The more detailed the service splitting, the better the reusability, but if only service reuse is considered, a large number of fine-grained services will be difficult to manage and will inevitably have an impact on overall performance. The design of the service needs to be comprehensively considered in terms of business needs, management difficulty, performance characteristics, etc.
Standardization issues
The development of the service adopts RESTful API technology, which has the characteristics of clear structure, easy to understand, easy to expand, etc., and the interface specification standard, regardless of the front-end application is j**a,net, c, or php can be called. Just like designing a socket, it must be universal, so that whether your hair dryer plug is an American standard, a European standard or a national standard, it can be adapted.
dataops
DataOps extends the concept of DevOps to the world of data, providing a way to continuously operate data services. API Gateway is used to register and manage services, and realize dynamic discovery, automatic deployment, and automatic monitoring of data services. Effectively govern data services based on service operation monitoring data, including iterative optimization, service orchestration, automatic testing, and service decommissioning.
Write at the end
The data service layer (oneservice) has changed the traditional way of data integration and delivery, all the data integrated into the data middle platform is provided through the data service, the data service is not exposed to the data but the interface, and the data consumer does not need to directly obtain the data, but through the interface service.
Data services are not simply exposed to an APIFrom a functional level
Data services also include cross-data source services, subject data services, one-stop query services, subscription-based delivery, and full-link access
On a technical level
The data service adopts cloud-native technology, and has the capabilities of dynamic discovery, automatic deployment, automatic monitoring, and service governance of services.