2024 iAnalytics Lakehouse Market Vendor Evaluation Report Kojie Technology

Mondo Cars Updated on 2024-03-06

Scope of the study

Driven by the first-class and local information innovation policies, some areas of China's information innovation are moving from "pilot verification" to "large-scale promotion" stage. With the deepening of the replacement of information and innovation, iAnalytics has observed that on the demand side, the demand of enterprises for information and innovation products is gradually integrating richer business demands and future digital intelligence planning, and is shifting from "similar replacement" to "iterative upgrading"; On the supply side, Xinchuang products have crossed the "to use" and "can be used", and are entering the "easy to use" stage of strength competition. The specific needs of enterprises for the iterative upgrading of information and innovation products are as follows:

1. Software infrastructure

Database is the focus of the replacement of basic software, and the replacement needs of enterprises for databases are specifically to replace IOE databases in terms of function and performance, but also to meet the needs of enterprises such as cloud migration, resource elastic scaling, hybrid transaction analysis, multi-model data management and query, etc., so enterprises have begun to consider cloud native, storage and computing separation, HTAP, hyper-convergence and other functional characteristics for database replacement.

At the same time, in terms of data architecture, enterprises need to solve the difficulty of data development and operation and maintenance under the heterogeneous data source architecture, and the lakehouse architecture is becoming a new direction for the replacement of big data platform architecture.

2. Application

OA is the core office software of the enterprise, and it also ranks first in the comprehensive replacement software. Taking the document management of the OA system as an example, when enterprises replace domestic products, they add business scenarios such as the construction of knowledge system, the improvement of employee office efficiency, and the automation of office processes.

In terms of customer information management, the traditional customer information management of enterprises is mainly managed through Oracle, SAP and other systems, but in this localization replacement, on the basis of realizing the customer information management function, enterprises will consider the growth of the business and the future demand for digital intelligence for the interconnection of business systems, so the CDP with platform characteristics has attracted attention.

In this report, according to the IT architecture, iAnalytics divides the information innovation market into five levels from bottom to top: basic hardware, basic software, technical support layer, data layer, and upper-layer application software. The basic hardware includes chips, servers, PCs, printers, storage, etc.; The basic software includes the operating system, database, and middleware. The technical support layer includes low-level platform, data science and machine learning platform, privacy-preserving computing, information and innovation cloud, cloud native, security, etc.; The data layer includes a data middle platform, a big data platform, a data warehouse, a lakehouse, etc.; Upper-layer applications are divided into general application software and industrial software, involving multiple subdivided scenarios such as office, management and application, R&D and design, and manufacturing.

This report is aimed at the company's decision-making level, heads of digital departments, heads of information technology departments, and business leaders, and provides reference for the company's localization adaptation planning and selection through the definition of the needs of each specific market and the interpretation of the capabilities of representative manufacturers.

This evaluation report focuses on the data layer of the lakehouse market, and iAnalytics focuses on the ability evaluation of the lakehouse manufacturer Kojie Technology.

Market Definition:

Based on the lakehouse architecture, a data lakehouse provides unified storage, management, and computing of multi-model heterogeneous data, supports data application scenarios such as BI, data science, AI ML, and real-time analysis, realizes the free flow and sharing of data, and reduces the complexity of data development and operation and maintenance.

Party A's End Users:

Enterprise data department, IT department.

Party A's core requirements:

The development of cloud computing, big data, IoT and other technologies has led to the explosive growth of enterprise data volume, and the data types are also extremely rich. Enterprises have put forward new requirements for the storage, processing, and application of semi-structured and unstructured data, and it is difficult for data warehouses or data lakes to meet the needs of enterprises. In this context, the lakehouse data architecture can integrate the advantages of data warehouse and data lake, and become a new direction for enterprise data architecture evolution. The needs of enterprises for the lakehouse solution are as follows:

Realize the development paradigm of unified storage and batch streaming of massive heterogeneous data, and reduce data storage, computing, and O&M costs. In the process of building data platforms in the past, enterprises often formed a data architecture in which data warehouses and data lakes coexisted, as well as dual-link coexistence of "offline computing" and "real-time computing". Storage redundancy of data is created by the storage and invocation of data between the data warehouse and the data lake; At the same time, the storage, cleaning, and conversion of data in offline links and real-time links will bring storage and computing redundancy at the same time. Dual-link and data lakes also make the data architecture of enterprises extremely complex, and the O&M workload such as system monitoring, performance optimization, and troubleshooting increases exponentially. In addition, the traditional data warehouse and big data platform architecture are coupled with storage and computing resources, which is prone to redundancy of storage resources and insufficient computing resources in the face of large data sets, and enterprises need to spend hours or longer to query data, limiting the performance of big data analysis.

Realize the unified management of multi-mode heterogeneous data and improve data quality. On the one hand, the data lake itself is prone to form a data swamp due to lack of data quality and data governance, which reduces data availability. On the other hand, in the common data architecture built by data warehouses and data lakes, the flow and invocation of data between the data warehouse and the data lake need to be implemented by multiple engines, which is complex to operate, difficult to ensure reliability, and easy to cause data consistency problems.

It can simultaneously support workloads such as data analysis, data mining, machine learning, and RPA, and adapt to global data fusion and analysis scenarios. For the joint analysis of global data, taking the e-commerce platform as an example, the e-commerce platform needs to carry out joint analysis of unstructured data such as **, comments, and ** and structured data such as product sales and user behavior. For example, the data warehouse uses SQL ** to process structured data, which is suitable for BI analysis scenarios, and the data lake uses non-SQL ** to process unstructured data, which is suitable for scenarios such as machine learning and knowledge graphs.

Meet the localization requirements of ** units, state-owned enterprises and financial and other fields. The lakehouse architecture should be connected to various infrastructures such as servers, chips, operating systems, databases, and middleware, and should support localized adaptation to meet the independent and controllable needs of enterprises.

Vendor Capability Requirements:

It has the ability to store and manage multiple types of heterogeneous data in a unified manner. The bottom layer of the data lakehouse integrated data architecture supports automatic hot and cold hierarchical storage of multi-mode data such as structure, time series, documents, and images, and supports the storage of multi-model data as one or more of the three data lake types of Apache Hudi, Delta Lake, and Apache Iceberg on the basis of the storage layer, so as to achieve unified metadata management, support data management functions such as ACID transaction processing, version control, etc., so that multiple computing engines can share unified data storage.

It has the integrated technology of batch streaming. Vendors should support a set of development paradigms to implement stream computing and batch computing of big data, reducing the difficulty of data development and O&M. For data collection, vendors should reduce the complexity of batch stream collection task configuration, and after one configuration, the program can automatically collect batch and stream data. For data analysis, the vendor's lakehouse solution should provide streaming analysis capabilities to support real-time business decision-making.

It supports the storage and computing separation architecture to achieve low-cost storage of massive data. It supports storage and computing separation, and can elastically scale computing resources and storage resources as needed. Among them, the resource scheduling system should integrate machine learning algorithms to make intelligent decisions on resource allocation based on factors such as task priority, resource demand, and system health, and improve resource utilization through flexible task scheduling.

Supports a wide range of workloads. The data lakehouse data architecture should support the integration of general data processing engines such as batch processing engine, stream processing engine, interactive query engine, interactive analysis engine, and machine learning engine, or support multiple workloads with a unified engine, so as to adapt to the scenario where data analysts can perform fusion analysis of multi-model heterogeneous data in one language.

Manufacturers should have the ability to adapt to domestic information and innovation. In line with the standards of information and innovation, to achieve domestic substitution. Manufacturers need to be compatible with domestic mainstream software and hardware, including but not limited to localized chips, servers, operating systems, middleware, etc., to meet the localization needs of enterprises.

Description of Inclusion Criteria:

1.Meet the capability requirements of all vendors in the data middle platform;

2.From 2023Q1 to 2023Q4, the number of paying customers in this market is 5;

3.From 2023Q1 to 2023Q4, the contract income of this market is 10 million yuan.

Manufacturer introduction:

Founded in 2019, Kojie Technology is a leading provider of big data basic software in China, committed to the research and development and application of independent and controllable big data base products, and promotes enterprises to fully realize the transformation and upgrading of data-driven organizations. KeenData Lakehouse, a self-developed lakehouse data intelligence platform, has the characteristics of cloud native, batch streaming, and low performance, which can provide organizations with a one-stop full-process data capacity building solution integrating data management, development and mining, and operation and maintenance.

Product Service Introduction:

Keendata Lakehouse, the core product of Kejie Technology, is a data base product independently developed based on cloud native technology, providing end-to-end one-stop big data basic software solutions. The upper-layer product integrates technologies such as data fabric, active metadata management, and data mesh to provide a series of products and functions covering the data lifecycle, including but not limited to data development and management, data synchronization, real-time computing, data standards, data quality, data assets, and data services.

Figure: Schematic diagram of KeenData Lakehouse, Kejie Technology's lakehouse integrated data intelligence platform.

Vendor Evaluation:

In addition, Keendata Lakehouse has obvious advantages in query performance, ease of use, storage and computing separation, etc., in addition, Kejie Technology's information and innovation ecosystem is perfect, which can be fully adapted to domestic software and hardware products, and has accumulated rich case experience in central state-owned enterprises, energy, industry and other industries.

KeenData LakeHouse provides an enhanced lakehouse engine with efficient query performance. KeenData Lakehouse provides batch-stream integration capabilities that flexibly support scenarios such as batch processing, real-time computing, batch processing and analysis of real-time data streams, and batch stream linkage and conversion. In addition to flexibility, KeenData Lakehouse also optimizes the performance of lakehouse queries. For example, for the query performance problem caused by too many small files in the real-time link, KeenData Lakehouse can automatically trigger asynchronous compression, merging and cleaning of small files according to predetermined policies. For offline queries, KeenData Lakehouse provides an automatic index building service that prioritizes indexing of frequently queried data columns to improve query performance. In multi-dimensional aggregation analysis, Kojie redistributes files through pre-computing to accelerate multi-dimensional query performance.

Based on the unified metadata service, it provides a unified SQL query engine, which is simple and easy to use, and lowers the threshold for developers. On the basis of ACID characteristics and ensuring metadata consistency, Kojie Technology provides unified metadata services, in which the metadata engine can be connected to heterogeneous data sources such as Oracle, MySQL, SqlServer, Elasticsearch and NoSQL data sources, and is compatible with multiple data processing engines such as Spark, Presto, and Flink. The metadata federation view enables unified management of data ecosystems, such as data lakes, data warehouses, and external data sources. On the basis of unified metadata, KeenData Lakehouse supports cross-source federated queries with a unified SQL query engine, which lowers the threshold for use and helps users achieve global analysis of global data.

KeenData Lakehouse supports the storage-compute separation architecture to achieve low-cost and simple storage for customers. In KeenData Lakehouse, data can be stored in HDFS, S3 and OSS, and Kojie Technology provides a unified resource identifier to make the data storage format completely transparent to users, and users can use data resources intuitively. For the performance of the storage-compute separation architecture, Kojie solves performance problems such as metadata rename through metadata caching, and improves the invocation performance of underlying data through data caching. In addition, the storage-compute separation architecture supports elastic scaling of computing resources and hot and cold tiered data storage, reducing data storage costs.

Experienced in the industry and widely recognized by the industry. On the basis of leading technical architectures such as storage and computing separation and lakehouse integration, Kojie Technology integrates the concepts of DataOps and data weaving, and proposes a set of enterprise three-dimensional data capacity building, including a lakehouse integrated engine with multi-architecture integration, data engineering construction, data self-governance, centralized management and control, decentralized empowerment service system, and data-driven organization, etc., to help enterprises realize organization-driven organizational transformation and upgrading. At present, Kojie Technology has accumulated a number of rich cases in central state-owned enterprises, ** energy, industry, finance and retail industries, such as China Unicom, Sinopec, China FAW, State Grid, Chinese Life, China Aerospace, etc. At the same time, Kojie Technology became the first batch of manufacturers to pass the special evaluation of the cloud-native lakehouse integration capability of the Academy of Information and Communications Technology.

The information and innovation ecosystem is perfect, and it is fully adapted to localized software and hardware products. Kojie Technology adheres to independent research and development, and has applied for more than 150 software works and patents in related big data related fields around the KeenData Lakehouse integrated data intelligence platform. At the same time, Kojie is also continuing to improve the information innovation ecosystem, and has completed technical compatibility certification with Kylin Software, Feiteng, Renmin Jincang and other enterprises, and passed the Kunpeng chip, Kunpeng cloud, Kunpeng technology full-stack information innovation standard certification, especially worth mentioning is that Keendata Lakehouse products have passed the "credible excellence" authoritative certification of 5 software products of the Ministry of Industry and Information Technology, highlighting the outstanding achievements of Kejie Technology in big data technology research and development and product security and reliability. and the determination and strength to actively promote the coordinated development of the information and innovation industry chain.

Typical Customers:

CICC, China FAW, Sinopec Exploration Institute.

Related Pages