aloudata
This project case was submitted by Aloudata and participated in the selection of the "Data Ape Annual Golden Ape Planning Activity - 2023 Big Data Industry Annual Innovative Service Enterprise List Award".
Since the 18th National Congress of the Communist Party of China, we have continuously increased support for financial technology innovation and expanded the pilot scope of financial technology innovation in the capital market. At the same time, in October 2020, the China Securities Regulatory Commission issued the "14th Five-Year Plan for the Development of Science and Technology in the Industry", pointing out that promoting the digital transformation and development of the industry is one of the two major themes, and emphasizing the need to strengthen the construction of science and technology governance system and data governance system. Data governance is the cornerstone of the digital transformation of business support capability application scenarios, and digital transformation is a key part of fintech innovation. Fintech innovation has also become an inevitable choice for the company's stable operation and development.
On the basis of ensuring the smooth operation of the business system, strengthening compliance management and paying attention to risk prevention, Capital ** continues to increase investment in financial technology, supports the implementation of business strategies through continuous improvement of digital level, and relies on financial technology to ensure high-quality business growth. In order to meet the needs of business development, the first has built a large number of information application systems, and internal research found that there are still a large number of manual processing and processing data, resulting in inconsistent calibers of the same data and repeated processing of the same data of all parties, which consumes a lot of manpower and lacks efficiency, and cannot ensure the quality of the final application data. Capital is eager to break the data silos of various application systems, sort out metadata and master data, standardize data standards, establish data models, provide data quality assurance, and maximize the value of financial data.
In order to solve the above problems and continue to improve the digital level of business analysis, risk management and regulatory reporting, Capital decided to build a new type of company-level data center, starting from data application, changing the traditional data warehouse side, building a data processing platform, index middle platform and data application, providing an overall solution for data governance, ensuring data quality and security, and realizing the role and value of financial data.
Implementation time:
Project start: July 2023.
Intermediate important time node: November 2023.
Project Completion Time: January 2024.
1. High-quality business strategy decision-making
The data islands of the first business and application systems are connected to analyze the overall operation situation in a centralized manner, and the horizontal comparison of operating data between different businesses can also be carried out in the vertical comparison within the same business, which can support real-time viewing of the progress of various business assessment indicators, and provide strong data support for the company's management to adjust its business strategy in real time.
2. Data asset management
Usually, business people need to process and apply basic data to support business decisions. The data center provides a data development platform - the index center, which uniformly defines the caliber of the index, designates the person responsible for the index data, and ensures the quality of export data. Through data permission control, one development can be used by multiple people, avoiding duplicate data development, managing data assets, and improving data application efficiency.
3. Data sharing
Through the construction of a company-level data center, the content, source, and responsibility positions of master data are sorted out and determined to ensure data quality. After the responsible main post maintains the data once, it can be used by other application systems to obtain the master data through the data center interface, which not only ensures the consistency of the same data between the application systems, but also reduces the uncertainty caused by multiple manual maintenance data in the same business chain. The data center collects the core data of each application system and provides interfaces, and uses the system to obtain other data, reducing the complexity of sharing data between application systems and improving data security.
The inventory found that the problems and scenarios that need to be solved urgently include the following aspects:
1. The phenomenon of data islands is seriousTens of thousands of data tables are scattered in more than 10 different business systems, databases, and platforms, and the phenomenon of data islands is serious.
2. The caliber of the data is inconsistentThe development links are inconsistent and the indicator caliber lacks effective management, resulting in inconsistent data obtained from different data tables or services for the same business indicator.
3. It is difficult to trace the caliber and evaluate the impact surfaceTraditional data analysis solutions are difficult to open up the overall data lineage, which makes it difficult to trace the caliber of indicators. When adjusting the data link, it is also difficult to see the impact on the downstream;
4. Insufficient efficiency of data use and analysisInvestment managers have an increasing demand for differentiated analysis of different products, but it is difficult for them to complete data extraction through data warehouse tables, and the pain points of data use are obvious.
5. Lack of flexibility and agility:**The market changes rapidly, and the analysis strategy needs to be adjusted quickly, and the existing data system cannot meet the demand.
In this context, the Capital Data Platform team investigated a variety of data warehouse solutions, and the traditional data warehouse + BI idea is difficult to meet its needs for efficient data management and intelligent analysis.
Based on the above problems, Capital ** and Aloudata Daying Technology have developed a more efficient and unified set.
1. Agile data warehouse technology solution under the concept of more intelligent datafabric architectureModern data platform solution adapts to the new needs of the digital era.
A data fabric is a new approach to data management and integration that brings together the complex components of a data ecosystem to provide a complete and cohesive approach to data management. Unlike a data lake, instead of moving data to a centralized location, a data fabric relies on powerful data virtualization technologies and data governance policies to unify data management. The data fabric approach unleashes the productivity of data by breaking the limitations of previous generations of data processing technologies such as traditional data warehouses and data lakes.
Therefore, this solution abandons the traditional data warehouse data architecture (sticker source layer-> detail layer-> aggregation layer), builds a virtual detail layer based on the Noetl concept for cross-data source query, intelligently builds a data aggregation layer according to the downstream data usage situation, automatically materializes data to improve data application performance, simplifies the length of the data development chain, saves data center infrastructure costs, reduces data center operation and maintenance costs, and builds a new form of data warehouse.
Specific to the scheme design, this agile data analysis scheme implements:
1. Map external collected data, business database data (MySQL, Oracle, SQLSserver, etc.) and object storage data to the agile data warehouse through PDS (physical dataset, that is, the mapping of the source table of the business database), without the need for one-to-one data replication and the ODS layer of the traditional data warehouse.
2. Define a new VDS (virtual dataset, that is, the data retrieval logic of the data view) based on PDS VDS, in the process, there is no need to care about details such as data storage and computing scheduling, and there is no need to physically copy data, and it can be nested in multiple layers until the virtual dataset available for the target scene is defined.
3. Define unified models and indicators based on virtual datasets, and connect to external reports or analysis tools through API JDBC open interfaces, or export to external databases or files through JDBC to share data with external systems.
4. Configure the projection acceleration policy according to the user's access requirements, and the system intelligently builds the acceleration policy according to the user's query history to achieve rapid response to external business data query. Based on the nested dependencies of the VDS, the projection automatically builds a data update link to achieve self-scheduling and automatic projection data production.
In the application stage, the solution has successively completed the construction of data virtualization engine and indicator service platform, of which 100% are information innovation technologies and products, covering multiple dimensions such as data collection, management, analysis, and display, and will achieve the effect of stable operation of versions and services by the end of 2023. Each downstream application system obtains data from this platform to improve data consistency; Indicator data is defined and developed on this platform to improve the production efficiency of indicator data, ensure the consistency of indicator data, and avoid repetitive data development work. The data warehouse architecture of the platform saves data storage costs from the overall cost, improves data development efficiency, and meets the needs of real-time business data analysis in an agile and efficient manner.
LeaningAloudata Air Logical Data Platform, Aloudata CAN Automated Metrics PlatformThe built Noetl agile data analysis solution has significant advantages in terms of cost reduction and technological innovation, and has achieved significant benefits and demonstration results
More than double the efficiency of data-based operationsThis scenario redefines how data works. There is no need to wait for data synchronization and lengthy ETL scheduling, everyone can self-discover trusted data, and perform global data exploration and data preparation at any time, achieving the ultimate agility of enterprise data operations.
2. Up to 100 times the data lake analysis performance:This solution provides more than twice the data query performance compared with open source solutions such as Presto and Impala. Intelligent acceleration technology can achieve up to 100 times performance improvement for an interactive data analysis experience.
% more than storage cost savings:This solution builds a data lake based on object storage technology, materializes on demand, and saves nearly 2 3 costs compared with the open source HDFS solution, and greatly reduces storage costs through automatic ** of useless data storage and automatic merging of similar data stores.
% more than data management cost savings:This solution realizes metadata-driven intelligent, proactive, and continuous data management, making data management "automatic", and saving a lot of management investment in data governance and risk response.
From the perspective of technological innovation, through the concept of NOETL, this solution can reduce data redundant storage, improve data ETL efficiency, and reduce the complexity of data application development, thereby effectively improving the efficiency of data application development and greatly reducing the human and material costs invested in data assetization. In addition, the solution also provides full-link lineage analysis, brings efficient and convenient data development and application experience to IT technicians and business personnel, improves the efficiency of communication and collaboration between technical personnel and business personnel, promotes the company's business development and brand building, effectively accumulates intangible assets, promotes the company's business transformation and the formation of market competitiveness, and achieves the following major technological innovation breakthroughs:
1. Data Fabric architecture practices
Support federated query: Using virtual data warehouse technology, the first-of-its-kind business data scattered everywhere is managed and defined in a unified manner, without copying the original data (no ODS layer), and directly building a detailed layer (DWD layer) to reduce construction complexity and storage costs.
Through virtualization technology, it provides a consistent view of data and supports data usage in scenarios such as analysis, reporting, and AI through a set of query languages.
Intelligent materialization acceleration: Based on user query behavior and intelligent acceleration capabilities driven by business metadata, data query and analysis are nearly 100 times improved compared with traditional query engines such as Presto and Impala. Through intelligent automated production, it completely replaces the original data integration, development, and operation and maintenance work that needs to be completed manually, reducing costs and improving efficiency.
2. The definition of indicators is production and definition is service
It provides extremely flexible and declarative metric definition capabilities, relying on automatic data production technology, the defined metrics trigger automatic metric data production, and provide them to various data consumption scenarios through many channels such as JDBC, APIs, and Excel plug-ins.
3. Full-link kinship
It provides end-to-end full-link column-level lineage capabilities from reports to indicators to agile data warehouses to original business libraries, thus providing a reliable evaluation basis for indicator caliber traceability and change impact assessment.
First**
Capital ** shares *** was established in February 2000 with a registered capital of 27300 million RMB. On December 22, 2022, the company was listed on the Shanghai ** Stock Exchange (**601136). The company is headquartered in Beijing, and the controlling shareholder is Beijing Capital Group, and the actual controller is Beijing State-owned Assets Supervision and Administration Commission.
After more than 20 years of steady development, the company has become a comprehensive first-class company with full license business qualification, balanced business structure and distinctive characteristics. The company's business scope covers asset management, proprietary investment and trading services, investment banking, brokerage, wealth management, credit financing, research and consulting, private equity management, alternative investment and other fields, providing various professional financial service solutions for corporate customers, institutional customers, retail customers, high-net-worth customers, etc., and forming its own characteristics and brand advantages in the fields of asset management and fixed income investment transactions. Over the years, the company has maintained a good development trend, standardized operation and management, and good asset quality.
·aloudata
With the mission of "making data ready at any time", Aloudata is committed to eliminating the bottleneck of data management technology, improving the level of ETL engineering automation, and helping enterprises smoothly upgrade to the next generation of big data infrastructure.
Aloudata's self-developed Aloudata Air logical data platform supports the logical integration, integration and query of heterogeneous data, and achieves second-level query response and saves more than 50% of storage and computing costs through adaptive materialization acceleration and automatic** technology. Based on the world's original operator-level lineage analysis technology, the Aloudata Big active metadata platform enables complex data links to be seen, managed, and managed, and realizes more refined and intelligent data management. AlouData CAN is an automated indicator platform that changes the traditional model of "IT development of business requirements", supports business personnel to flexibly analyze indicators from any granularity and any dimension, and realizes "definition as development and definition as service" of indicators.
At present, AlouData's products have been implemented in the complex data environment of many leading enterprises, and a number of data fabric best practices have been successfully delivered.