Lessons learned from 17 tech companies to measure developer productivity

Mondo Technology Updated on 2024-02-20

Author | rafiq gemmail

Translator |Hirakawa.

Curated | tina

Gergely Orosz (author of The Pragmatic Engineer Newsletter) and ABI Noda (CEO of DX and one of the creators of the DevEx framework) published an article on Pragmatic Engineer titled "Development Productivity Measure: A Real-World Case Study". InfoQ reports on the results of NODA's survey of engineering metrics used by 17 well-known tech giants. NODA found that the top teams weren't adopting frameworks like DORA or SPACE at scale, but were using a mix of qualitative and quantitative metrics specific to the organization. NODA and Orosz work backwards from the results sought by the metrics implementation team to provide recommendations for defining such metrics.

Noda writes that he "interviewed teams at 17 well-known tech companies that measure developer productivity." In this article, Noda and Orosz focus on four types of organizations: Google with 100,000 employees, LinkedIn with 10,000 employees, Peloton with less than 10,000 employees, and Notion and Postman with less than 1,000 employees. The metrics used range from typical PR and CI metrics to those systematically selected by Google.

NODA observes that in practice, "the DORA and SPACE metrics are used selectively" rather than being used entirely. He writes that while the survey shows that "every company has its own tailored approach," he believes that "organizations of any size can adopt Google's holistic philosophy and approach." Google's approach requires metrics to be selected based on three types of metrics: speed, ease of use, and quality, Noda writes. He writes that there is a "tense relationship" between these three dimensions that "helps to reveal potential trade-offs."

Noda writes that Google uses "qualitative and quantitative measures to calculate metrics" because it provides "the most comprehensive picture possible." Noda cites a range of information acquisition methods used by Google, from satisfaction surveys to "usage log metrics." He wrote:

Whether it's a measurement tool, a process, or a team, Google's development intelligence team shares the perception that a single metric doesn't measure productivity. Instead, they look at productivity in terms of speed, ease of use, and quality.

Similarly, Noda and Orosz describe how LinkedIn combines quarterly developer satisfaction surveys with quantitative metrics. In the article, Noda mentions a series of metrics used by the LinkedIn Developer Insights team. These indicators are designed to reduce "resistance to key development activities". The metrics used by the team include CI stability metrics, deployment success rates, P50 and P90 build times, review response times, and time for submissions to go through the CI pipeline. Noda describes how teams can support this quantitative measure with qualitative insights, and gives an example that compares build time to "developer satisfaction with the build". LinkedIn also denoises objective numerical indicators using the "Winsorated Mean".

The Windsor mean means finding the 99th percentile and then cutting all the data points above the 99th percentile instead of culling them. If the 99th percentile is 100 seconds and you have a data point that is 110 seconds, cross out 110 and write 100, and now your calculated mean is a much more useful number.

Peloton, which represents an organization of 3,000 to 4,000 employees, has evolved from the initial "qualitative insights through development experience surveys" to incorporate quantitative metrics, Noda wrote. For example, lead time and deployment frequency are used as objective metrics for speed. Peloton's metrics also include qualitative engagement scores, service recovery times, and**quality (measured by the percentage of "PR under 250 lines, line coverage, and change failure rate"), he wrote.

Referring to smaller "expanding" organizations like Notion and Postman, Noda writes that these organizations typically focus on measuring "movable metrics." He explained that this is a vulnerable metric that the metric implementation team can "move around by having a positive or negative impact on the metric through their work." An example of this is "ease of delivery". Noda writes that this metric reflects "cognitive load and feedback loops" and can be adjusted based on "how easy it is for developers to feel it is to get the job done." Another common movable metric is "Percentage of time lost by developers due to obstacles and resistance". NODA describes how this metric reflects its value:

This indicator can be converted into money: this is the biggest benefit! This makes it easy for business leaders to understand time loss. For example, if an organization with $10 million in engineering payroll costs reduces lost time from 20% to 10% through a program, this would save $1 million.

Given the contextual nature of such engineering metrics, NODA recommends that organizations define metrics in the following steps:

Define your goals in the mission statement, explaining "Why do you have a development productivity team?" ”

Start with the goal and define the top-level metrics based on speed, ease of use, and quality."

Define "operational-level metrics" related to "specific project or objective key results", for example, the adoption rate of a specific development productivity-enhancing service.

With examples, NODA points out that the metrics chosen should take into account dimensions such as speed, ease of use, and quality. For example, if the goal is to make it easier for developers to "deliver high-quality software," then metrics should include "perceived delivery speed," "ease of delivery," and "frequency of events," he said.

This article by Orosz and Noda follows "Responding to McKinsey: Measuring Developer Productivity?" Another article was published after that. In a previous article, Orosz partnered with Kent Beck to challenge McKinsey's article "Yes, You Can Measure Software Developer Productivity." McKinsey's article proposes what we call "opportunity-centric" metrics "to determine how to improve how products are delivered and how to improve value." This article discusses developer productivity measures based on dora and space, including recommendations to encourage leaders to optimize the efficiency of individual developers, and an example of a "non-coding activity (e.g., design meeting)." The indicators proposed in this paper include tracking "individual contributions" and measuring "talent capability scores".

Beck warns of the risks of measuring individual productivity rather than delivering results, sharing his own experience of seeing these metrics turn into "metrics of money and status to incentivize improvement." While this can lead to "behavioural change", it can also be influenced by gamification, turning into incentives to "improve these metrics in creative ways," he said. Beck and Orosz encourage leaders to focus on measuring "impact" rather than "workload." In particular, Beck recommends that such metrics should only be used for the continuous improvement feedback loop of the thing being measured, and not for anything else. He also warned that misuse of metrics to measure individuals can lead to safety concerns:

Be clear about why you are asking this question and the power relationship between you and the person being measured. When those with power measure those without power, the results are distorted ......Analyze data at whatever level it collects to avoid perverse incentives. I can analyze my own data, and my team can analyze their own aggregated data. ‍

NODA also cautions that if someone at the CTO, VPE, or Director of Engineering level needs to provide developer performance metrics, it's a good idea to make sure the reporting is at a decent level. NODA recommends choosing metrics that represent Business Impact, System Performance, and Engineering Organization-level Development Efficiency, such as project-level metrics User NPS and Lost Week Time. NODA advises senior leaders:

In this case, I would suggest that it would be better to redefine the problem. Your leadership team doesn't want perfect productivity metrics, but a way to reinforce that you're a good steward of their engineering investments.

In response to the McKinsey report, Orosz and Beck cautioned that "people optimize what is measured." They cite Goodhart's law, which states that "when a measure becomes a goal, it ceases to be a good measure." ”

Original link:

Related Pages