Cloud-native architectures generate more data, increasing the cost of observability. However, there are better ways to manage these expenses.
Translated from Real Talk: Why is Datadog so expensive?, by Rachel Dines. I've seen a lot of posts on X (formerly Twitter), Reddit, and HackerNews lately discussing Datadog's high costs. It's a hot topic, and engineers are blogging about how they're taking a forceful approach to lowering their metrics. But how did we get to this point?Why are these costs so high?Why do companies pay more for their observability than for production infrastructure?There's a lot of argument and accusations of lockdown and corporate greed, and that's certainly justified to some extent. The deeper problem is the fundamental architectural changes to adopt containerized infrastructure and microservices applications. If we don't understand and solve this problem, history will repeat itself. Well, it's a fact, I work for Chronosphere, which is a company that competes with Datadog. I guarantee that this article will not sell you our products. Datadog is a formidable competitor and I've watched it build an amazing business over the years. My former company was a close partner of DataDog from 2015 to 2018, and we've seen its exponential growth that we're eager to emulate. At the same time, I've seen Datadog's customers feel increasingly unhappy with the soaring and unavoidable costs, yet they feel like they can't leave.
This is one of the reasons why I joined Chronosphere in 2021, because I saw this trend developing. Before I joined the space, I did some market sizing and analysis and found that observability has the biggest impact on infrastructure spending: for every $1 spent on the public cloud, you might spend 025 to 0$35. This makes me feel like this is a market worth disrupting.
The root cause of the problem is simple: there is much more observability data (metrics, logs, traces, and events) than these tools**. As a result, their architecture doesn't fit into this amount of data, and their pricing isn't commensurate. There are many reasons why we end up generating so much data. Business Drivers:
Digital transformationThe penetration of technology into more areas of the business naturally brings more data with it to monitor system health and ensure that the overall system is running smoothly.
Customer expectations are heightened and risks are heightened: According to the 2023** Reliability Report, the average American tolerates switching to a competitor after less than four unreliability or outages on an app or **. Running high-performance and high-availability services that deliver exceptional customer experiences requires more detailed observability data. Data hoarding: When you're fetching a lot of data every minute, it's hard to know what data is useful. Without the right tools to parse it, you could fall into the trap of "I never know when I'll need this data" and keep more data than necessary.
Technology Drivers:
More telemetry is generated by containers and microservices: Cloud-native environments (i.e., containers and microservices) have significant advantages, but the need to monitor the health of each individual component and service naturally generates more data. For example, each container and microservice now produces as much observability data as each virtual machine (VM) and monolithic application used to. But now, instead of just a few dozen virtual machines and a handful of applications, you have thousands of containers and dozens of microservices. The scale of some cloud-native environments: By design, cloud-native is decentralized, and engineering teams can spin up components quickly, which means that the number of services and containers grows exponentially, generating a lot of data.
This data growth has led to a spike in observability spending. If you don't change your pricing model or software to accommodate data growth and continue to base pricing on traditional monitoring standards, cloud-native architectures suddenly become prohibitively expensive. I suspect for two reasons:
Shareholder value:d Atadog's ** has performed exceptionally well over the past few years. If it decreases, it will immediately affect revenue, which in turn will affect the reported earnings, which in turn will lead to the share price**.
Cost of goods sold:d Atadog has gone through three generations of architecture, and its latest Husky has just been released in 2022. This re-architect is primarily focused on efficiency, but it doesn't receive**, so I think it helps reduce cost of goods sold (COGS) and keep margins healthy. Since Datadog may not be making another re-architecting investment anytime soon, it's less likely to hurt its margins by lowering **. If you don't want to pay for Datadog, there are a few options.
An attractive alternative is to use open-source tools to run your own observability system internally. The good news is that, at least for metrics and tracking, open-source tools have come a long way and are forming an industry-accepted standard. Prometheus and OpenTelemetry with various time series database backends (Mimir, Thanos, or M3) are viable alternatives to Datadog. But it's important to note that this usually won't save you money in real dollars. It's just exchanging capital expenditures (CAPEX) with operating expenses (OPEX). The manpower and infrastructure costs to run these systems are considerable, and if you try to omit certain steps, you may regret it. I recently spoke with a friend who migrated his company from an expensive commercial SaaS solution to an in-house open source tool. He admits that the company isn't actually saving any money, considering that about 8% of developers are now committed to running the system.
This is not where I sell my company's products to you. This is where I would say that the tool was based on data growth from the beginning. The cost of the solution is always in the hands of the customer, so there are no unexpected overruns.
Just as Datadog, New Relic, and similar tools replaced the previous generation of Solarwinds, BMC, and CA Technologies, this new generation of observable*** is starting to make a splash. Talk to these vendors and find out how they deal with the problem of too much observational data instead of dealing with it with better unit economics.
Datadog's high bills and business lock-in have somehow become a necessary evil;You know you need observability, but you're not too sure about all the options. Although there are some issues with Datadog's billing methods and proprietary**, it's been around long enough that it still looks like a viable option. But it doesn't have to be this way.
As more observability companies enter this space, there are options that are committed to addressing the growth of high-cardinality data from the start. These options give you a more flexible infrastructure, more control over your data, and clearer visibility into your monthly bills, ultimately creating a more sustainable and cost-effective operating model for your observability team.