k.In "The Twelve Trends of the Future", K argues that we are in an era of data flow. Business is the business of data. At the end of the day, you're dealing with data.
Indeed, when data becomes the new core factor of production, data analysis is like one of the most important production tools, determining the productivity level of enterprises in the digital era. In recent years, a large number of data analysis companies have emerged, whether it is Snowflake and Databricks abroad, or Starrocks and PingCap in China, aiming to meet the increasing demand for data analysis and help various enterprises fully unleash data productivity.
Among them, Starrocks is a rising star in the field of data analysis. In just a few years, Starrocks has acquired Star 6300+ on GitHub, becoming the fastest growing open source database project of its kind, and officially donated to the Linux Foundation at the end of 2022, attracting developers and users from all over the world to participate in the construction of future communities.
As Zhang Youdong, a member of Starrocks TSC and CTO of Jingzhou Technology, said, Starrocks hopes to simplify the data technology stack through technological innovation and realize the vision of "One Data, All Analytics" in all scenarios through one engine.
At present, digital technologies such as artificial intelligence, big data, and the Internet of Things continue to improve the productivity of enterprises, but also continue to increase complexity. This complexity is especially evident in the field of data, especially the continuous integration of data technology and business scenarios, which plagues many enterprises in digital transformation.
Complexity is first reflected in the data itself, which is accelerating towards mass quantification and diversification. In the past, an enterprise tended to focus on structured data, and the data scale was usually terabytes;Now, unstructured data such as text data, trajectory data, and log data has increased significantly, and petabyte-level data volume is becoming the norm for more and more enterprises.
Second, the business scenarios of enterprises are becoming more and more complex, and with it, there is a massive increase in data stack-related technologies, tools, and products. From a single data warehouse in the past to a metric platform, interactive analytics, real-time analytics, stream computing, and more, the data stack environment faced by enterprises is far more complex than before, and this complexity continues to increase with the integration of AI-related technologies.
Third, the complexity of data consumption needs has increased significantly. In the past, data consumption was only the "power" of a few people in management;Now, "data for everyone" has become the goal pursued by many enterprises. For example, some cutting-edge Internet, financial and other enterprises, and even an ordinary business employee are data consumers, and they will conduct data analysis at any time in their daily business.
Therefore, the complexity challenges faced by enterprises in the data space will be a must in the digital transformation process as the massive data environment becomes an established fact. In Zhang Youdong's view, "one data, all analytics" is the key to resolving the complexity of data analysis, and starrocks3The launch of version 0 is a big step forward in achieving the goal of "One Data, All Analytics".
As we all know, data analytics products have a long history. Before the rise of big data, traditional data warehouses such as Teradata and Greenplum have always occupied the mainstream market positionWith the rise of big data, big data platforms represented by Hadoop have quickly become the basic platform for data analysisNowadays, the rise of technologies such as cloud native and lakehouse is accelerating the innovation of data analysis products.
At present, there are many companies related to data analysis. However, Starrocks has attracted a lot of attention from the industry with its outstanding performance. Since its official open-source launch in September 2021, Starrocks has grown into a star project in the open source field and has been recognized by developers around the world. In the author's opinion, the key to Starrocks' phased success in a short period of time lies in the iteration speed and innovation ability of the product.
Since open source, StarRocks has gone through three major iterations, starting from 1Version 0 focuses on performance, to 2Version 0 revolves around fusion unity, and now 3Version 0 revolves around the innovation of the lakehouse, and Starrocks has become a phenomenal product in the field of data analysis.
Taking the data warehouse architecture as an example, the separation of storage and computing is the general trend. With the rapid development of cloud-native and other technologies, resources such as computing and storage can be better elasticized through the storage-compute separation architecture to cope with the use of resources by the business, so as to achieve cost and efficiency optimization. starrocks 3.0 also adopts the storage and computing separation architecture, the architecture design is highly abstract and minimalist, does not need to rely on complex components, and has strong scalability and elasticityIn addition, it supports multi-warehouse, multiple warehouses share a single data, different warehouses are applied to different workloads, computing resources can be physically isolated, and internal elastic scaling can be independently scaled on demand.
The storage-compute separation architecture truly brings two major benefits: cost reduction, efficiency increase, and elastic scaling. For example, at the storage level, Starrocks 30The overall storage cost can be reduced by 80%, and because the computing nodes are stateless, the availability of computing can be improved through rapid elasticity and cross-AZ deployment, and computing resources can be physically isolated and scaled independently on demand. Zhang Youdong introduced.
In addition, the integration of the lakehouse is also an important trend in data analysis products. Data warehouses often have the advantages of high data quality, excellent performance, and strong real-time analysis, while data lakes can store various types of data, with strong scalability and openness. Therefore, the integration of the respective advantages of data warehouses and data lakes has become the direction of the industry's efforts.
There is no shortage of lakehouse-related solutions in the industry today. For example, if the performance on the lake is not satisfied, the solution of opening a position on the lake is used to accelerate the queryAnother example is the ability of data warehouses to extend the ability to query external data lakes.
Zhang Youdong bluntly said that these solutions are more like a combined solution, and do not really achieve the integration of the lakehouse, "The integration of the lakehouse means that one architecture meets the needs of all data analysis, that is, one data, all analytics." ”
Take Starrocks 3For example, the 0 lakehouse architecture realizes unified data storage and management, and one data is used as a single source of truth;In addition, the powerful analysis engine can meet the query requirements of scenarios such as BI reports, interactive analysis, real-time analysis, and ETL data processing based on a single piece of dataMore importantly, it has the ability to accelerate on-demand data processing and queries.
In the future, the trend of data analysis evolution will definitely be the integration of data lakes, users do not need to pay attention to building lakes or warehouses, and the core goal is to solve data analysis problems at low cost and efficiency. Zhang Youdong added.
In addition, with the significant increase in data volume and business complexity, ETL has become an extremely hard work, which usually requires a lot of manpower and energy to work on ETL-related work. In this regard, Starrocks 30 is also aiming at the direction of no ETL, reducing the workload of ETL in the entire data management, and allowing users to minimize the perception of ETL through materialized views, and is committed to simplifying the ETL pipeline from the full-link level.
There is no doubt that Starrocks 3The launch of version 0 is a key node in the development of the Starrocks project. This means that Starrocks has achieved an important breakthrough in product strength, which can help users achieve the unification of data analysis architecture in all scenarios, and also bring a broader market space for itself.
With the emergence of a large number of data-driven applications, the demand for data analysis and data consumption has also arisen. Gartner believes that data analytics has become a core capability for enterprises to build in their digital transformation. Therefore, the data analysis track has an extremely bright future prospect.
There is no doubt that from the perspective of Starrocks' community development, user base, and business ecosystem construction, Starrocks is at the extreme of rapid development, and it is worth looking forward to more in the future.
First, thanks to the adherence to the concept of open source, the Starrocks open source community has been in a very active state, bringing full vitality to the subsequent development. At present, the community development work is led by Jingzhou Technology, and contributes more than 70% of the core**;;In addition, leading companies such as Alibaba Cloud, Tencent, Volcano Engine, and Didi Chuxing have actively participated in the community, and continue to contribute many important features to the community, such as materialized views and CN elastic nodes.
Second, thanks to the active participation of leading customers in the industry and the improvement of product innovation, Starrocks products have been tempered in complex business scenarios of leading users in multiple industries such as finance, retail, logistics, manufacturing, and the Internet. It is reported that more than 300 large-scale users with a market value of more than $1 billion are currently using StarRocks in the production environment, covering a series of scenarios such as BI reports, interactive exploration analysis, real-time analysis, and lakehouse analysis, and it is expected to continue to promote product innovation and rapid iteration in scenario applications in the future.
Third, Starrocks attaches great importance to the construction of business ecology. In addition to the use of users in the leading industries, Starrocks currently cooperates with major domestic cloud service providers, and is committed to promoting the commercialization of open source projects with the help of the cloud computing ecosystem, so that the products can go to a wider market group and grow in the market competition.
Compared with developed markets such as North America, there is still huge potential for data analysis in the Chinese market, and Starrocks hopes to help more users achieve One Data and All Analytics through technological innovation. Zhang Youdong finally said.