Initially consider headlining "DevOps is dead, platform engineering is the future", but such an expression might be too absolute. Eventually, it was decided to use the word "" to describe DevOps, but that wasn't a civilized way of saying it. This article aims to revisit DevOps and Platform Engineering, and will analyze the concepts of DevOps and Platform Engineering respectively, and focus on some of the core elements advocated by Platform Engineering. At the same time, I hope that this article can bring some thoughts to students who are engaged in internal development platform (IDP) work.In 2009, the concept of DevOps was introduced, with a strong emphasis on team collaboration, automation tools, and process improvements to improve the speed and quality of software development and deployment. However, nearly 15 years after it was proposed, it was found that this approach did not achieve its goals as expected. From the perspective of deploying and releasing tools, whether it is J-One, JDOS or the current Xingyun deployment, there is still a certain cost for R&D personnel to deploy and release on a daily basis, but this phenomenon seems to be not only a problem at the tool level.
DevOps itself is a philosophy that emphasizes teamwork and enables development and operations teams to work closely together. While the importance of automation and tools is emphasized, it does not clearly indicate where to go. Hence the concept of platform engineering. Although it is impossible to verify who first proposed it, in July 2022, a Twitter message "DevOps is dead, long live platform engineering" quickly spread in DevOps circles at home and abroad, and received a wide response.
Platform engineering is a new O&M concept that emphasizes that the internal development platform should provide technical R&D personnel with the ability to self-serve. One of its core ideas is to provide flexible toolchains and workflows for technical developers by shielding the complexity of the infrastructure. In this way, the basic capabilities of the platform can be leveraged to solve problems autonomously without relying on the involvement of the platform layer, enabling the development team to work more efficiently and improve the speed and quality of software delivery.
Platform engineering is the discipline of designing and building toolchains and workflows that provide self-service capabilities for software engineering organizations in the cloud-native era. Platform engineers provide integrated products often referred to as "in-house developer platforms" that cover the operational needs of the entire lifecycle of an application. - Definition from platformengineeringorg (there are many definitions of platform engineering, but most of them have the same meaning: mainly advocating self-service to reduce the complexity and uncertainty of the underlying basic support tools, reduce the work process, and reduce the cognitive cost of the end user in the process of use, thereby improving the end user experience and increasing productivity).Platform engineering and DevOps are both concepts in the world of software development and operations that share a common focus on improving the efficiency and quality of software development and deployment, but they differ in focus and approach. Platform engineering focuses on building a reusable platform architecture, providing scenario-based capabilities and providing self-service experiences. DevOps, on the other hand, focuses on team collaboration, automation tools, and process improvements to improve the speed and quality of software development and deployment.
In 2023, Gartner has ranked platform engineering as one of the top strategic trends. In its recently released Top 10 Technology Trends for 2024, Gartner once again mentions platform engineering and takes it up a notch, indicating a further increase in the recognition of platform engineering in the industry.
Over the past few years, DevOps has been pursued and driven from a capability maturity perspective. However, quantitative assessments of inputs and outputs are relatively vague. Platform engineering proposes a number of ways to measure its value output, including self-service experiences and minimizing human input. By devoting itself to building self-service and scenario-based capabilities, it provides a valuable platform.
Back to the title of this article, let's talk about why developers are reluctant to take on operations.
DevOps emphasizes teamwork and encourages developers to take on some operational work. However, in reality, why is this often difficult to achieve? I think there are several main reasons:
Focus on core development tasks:Developers are often more inclined to day-to-day software development tasks, and they may not have much time and energy to focus on other things, otherwise it will affect the progress of the work on the day-to-day tasks.
Unfamiliar or uninterested:Developers may not have enough experience to handle operations, or they may not be interested in operations, resulting in a lack of motivation in operations.
The pot of operation and maintenance is too heavy and too complicated:O&M involves the production environment, so its scope of responsibility and impact is large. Any operational failure can lead to serious consequences such as system failure, service interruption, or data loss. As a result, taking on operational work can be an additional burden and responsibility for developers. In addition, O&M work usually includes a variety of trivial and cumbersome tasks, including 24/7 duty.
Lack of useful tools and platform support:Without easy-to-use and efficient automation tools and platforms, operations become more manual, increasing the cost and complexity of operations.
These may be some possible reasons why developers are reluctant to take on operational work. What is the essence of operations?
The focus of O&M is to ensure the safe and stable operation of the system. It not only needs to monitor the stability of the online environment 24/7, but also needs to handle various daily operation and maintenance tasks. These tasks may include resource management, routine inspections, troubleshooting and repair, and work order handling.
Recently, some major manufacturers have experienced major online stability failures, which has brought a lot of attention to the industry.
These recent online failures are a huge wake-up call for the entire industry, and all businesses are facing the same challenges with online stability.
Safe production, the alarm bell rings:In the face of online problems, we must not simply pursue speed and convenience, and we must maintain a sense of awe for any online operation.
Safety in production is everyone's responsibilityWhether it's the wrong logic written by a developer or the wrong upgrade operation by an operator, it can ultimately cause immeasurable damage to the company.
The stability of the production environment, the most difficult is not technology, but depends on the implementation of countless details, the stability of the guarantee requires a lot of investment, but the biggest problem of this matter is that it is difficult to be recognized, and how to measure the good work? There used to be a joke on the Internet, which roughly means "Those who write without bugs are often unknown and may even be killed; On the contrary, those students who often write bugs, because they are busy fixing bugs on a daily basis, can thrive", of course, the reason why developers are unwilling to undertake O&M is indeed because of the heavy responsibility for online stability, the burden of O&M work is also heavy, and there is a lack of applicable tools and platforms to support.
However, platform engineering was proposed as a new concept that aims to solve these problems and improve the software delivery process. Let's talk about the key success factors of [platform engineering] compared to DevOps.
As a relatively new concept, platform engineering has been recognized by Gartner for two years in a row, putting it at the forefront of our attention. In order to promote platform engineering within the company, I believe that the following aspects need to be clarified:
Platform Scope:There are many internal tools, and the first thing to do is to establish authoritative or certified tools, and continue to invest and iterate, rather than develop them separately, so as to avoid duplicate construction and waste of costs.
Platform Culture:In the end, the platform is made for whom, for whom to serve, technical R & D personnel are our God, to establish a platform culture based on service technology R & D personnel, and at the same time to meet the company's management perspective.
Platform Goals:The core goal is to build scenario-based tools, so that technicians can self-service in the platform, and take scenario-based and self-service as the core goal.
Platform owner:It is not possible for IPDs within an enterprise to be centralized in the same department, so it is critical to identify the owner of a specific scenario to eliminate the problem of unclear boundaries of responsibility.
Requirements**:Everything is based on R&D needs, and it is necessary to take into account the experience of R&D personnel, so as to avoid large and comprehensive version upgrades and changes, which lead to R&D migration of systems and resources, resulting in additional use costs.
Platform API:An in-house platform should naturally have a rich API to meet the needs of in-house R&D, and should also provide detailed documentation for technical staff to use.
In summary, how to promote platform engineering internally from a global perspective is discussed. Next, let's take a look at what characteristics the tools under the platform project should have:
Personally, I believe that internal tools are more important than consumer-facing products. Because consumer product users have the right to choose, but insiders don't have much choice, at most just a few complaints, but still need to continue to use. To create a tool that satisfies your insiders, I personally feel that you need at least the following key attributes:
Productization:The internal tool platform must be productized and positioned to serve the whole group, rather than being limited to a few people in their own departments, or dozens of people to use, and the target users must be positioned as all R&D students in the group, only in this way can the tools be done well.
User Experience:Pay attention to user experience, in addition to providing basic GUI interface, API and other capabilities, in addition to blocking complex back-end logic, to reduce the cost of user use.
Integration:Talking about integration here, it is not only about integrating various tools into the platform through the tool market like the current Xingyun Taishan, which is only the first step, but also to develop the use scenario as the goal, and take the application as the perspective of the workspace, such as at the time of release, integrate the observable views such as monitoring, logs, plans, alarms, etc., so that users can meet the needs of all the scenarios in one place.
Self-service:Users do not need the assistance of the platform classmates, can meet all functions, here is an example, we go to the bank to withdraw money, at the counter manual withdrawal, but need the assistance of the bank staff, but we through the ATM, the same can also be completely self-service withdrawal.
In the context of platform engineering, internal development teams may have the following common situations, such as these four aspects:
Productization:Internal tools are particularly easy to customize in terms of demand control, and after a period of time, they may evolve into customized products for a certain person or a small department.
Priority:Frequently receive or face high priority requests from "a certain C-X boss".
Recognition:Due to the disconnection from the business, it is difficult to measure value, and in the long run, the recognition of the value of the output may be doubtful.
Duplicate construction: The problem of duplicate construction of internal tools and platforms is more serious.
Personally, I think the internal platform team should insist on doing the following things:
Continue to collect user needs and plan a long-term roadmap for the platform.
Improve user manuals and best practices to improve user experience.
Keep an open mind and be sure to provide an API.
Ongoing promotion and operation of the platform for which it is responsible. (Giving birth to children and raising children).
In view of the problem of redundant construction, we should strengthen cooperation and joint construction, and avoid falling into the construction of small-scale self-congratulatory "personal departmental tools."
The main thing is to measure and evaluate from some indicator dimensions. If a platform or tool is built for a year, and it doesn't know anything about its own use, and only focuses on feature development, then how can you measure the value that the platform brings? I think the most important thing is to look for a key metric, which can be a business dimension, a product dimension or an organizational dimension, and I throw out a few dimensions just for reference:
User dimension(The first one is to improve the user experience by user dimension).
The number of weekly active users.
Empower the number of businesses.
NPS Net Promoter Score.
Product dimensions
Access efficiency. Execution efficiency.
Execution success rate.
Organizational dimensions
xx cycles. xx time.
In view of the future development of platform engineering, the current situation at home and abroad is as follows:
At present, a large number of foreign manufacturers such as Google, Spotify, Netflix, Walmart and a large number of companies are actively promoting the implementation of platform engineering within the enterprise, in November, CNCF officially released the capability maturity model of platform engineering, which is divided into 4 levels from 5 dimensions, and the maturity model dimension released by CNCF is more coarse-grained, mainly from the team personnel, platform applications, user experience, self-service and platform iteration and other aspects of evaluation. The functional dimensions of the platform are not divided in detail.
In China, platform engineering has gradually attracted everyone's attention, especially the personnel who were originally responsible for DevOps tools, and paid more attention to the new concepts and advocacy directions brought by platform engineering. The China Academy of Information and Communications Technology is also organizing experts in the industry to jointly sort out a platform engineering capability requirement standard that meets the current situation in China, and will clarify the functional dimensions of platform engineering. I'm currently involved in some of the work, so if you're interested, please feel free to contact me.
Finally, by Gartner, 80% of software engineering organizations will have platform teams by 2026, and 75% of them will include developer self-service portals. 80% of software engineering organizations will establish platform teams as internal providers of reusable services, components, and tools for application delivery.
Visible,Platform engineering isn't just a trend, it's the future of software delivery
Author: Jingdong Retail Jing Liangliang.
*:JD Cloud Developer Community**Please indicate**.