As more and more companies enter the field of machine Xi, how will the large language model (LLM) API market evolve?The origins of this market can be traced back to OpenAI's release of ChatGPT, which quickly achieved $1.3 billion in revenue. However, at this time last year, there was little competition in the large language model API market. Bard hasn't been around yet, let alone Claude, and Gemini is a small vision in the eyes of Google CEO Sundar Pichai. OpenAI has a monopoly position in the market and basically grabs all the value.
But over the past year, we've seen that large language models don't seem to form any moats other than the top-of-the-line offerings. GPT-4 is the only model that doesn't have a competitor, but competitors are already eyeing it up — the Gemini Ultra, Llama 3, and the yet-to-be-released mystery mistral model of more than medium size. But in GPT 3With level 5, you already have a variety of hosting options and can even host it yourself. This circumstance necessarily limits what any company can charge**.
Generally, companies enter a new market when they believe they can earn more than the required minimum profit threshold. The larger the company, the smaller the required profit threshold. For example, as an individual, if I want to start providing fine-tuning services for large language models, I first need to set a fairly high profit margin, because I have a very small customer base that can spread the cost. As my business grows, I will have a larger customer base to spread the costs, and more money will be available for optimization, and I will be able to offer large language model services at a lower cost, including:
Quantify and buy your own chips instead of renting a streamlined model to make your own chips.
With every optimization and process efficiency, you can increase your profits. Sounds great!This means that each token can bring more profits. But that's not exactly the case. In an ideal environment with star cows, this is true. But just as you're investing to make your token service more efficient, your competitors are doing the same, eating into your profits. In the words of Ben Horowitz, a well-known investor:You have to run as hard as you can to stay and not go back.
This suggestsIn the homogeneous large language model market, there will be a relentless competition for efficiency, with companies looking to see who can demand the lowest return on invested capital.
In the classic business strategy book "The Innovator's Dilemma," there's a classic case of how technological disruption happens, and this case is taken from The New Yorker's biography of author Clayton Christensen:
"For those who don't have much experience making steel, there are two main methods of steelmaking historically," he said. "The first is the large integrated steel companies, from which most of the world's steel is produced;Another type of small steel mill. In a small steel mill, where the waste is melted by an electric furnace, such a furnace can be easily configured with four. The most important advantage of small steel mills is that their steelmaking costs are 20% lower than those of large integrated steel mills. Imagine if you were the CEO of a large integrated steel mill, and in the best of years, the company's net profit would be only 2%-4%. So when faced with a technology that can reduce the cost of steelmaking by 20%, wouldn't you adopt it?Curiously, however, not a single large integrated steel company in the world has invested in a small steel mill. Today, all but one of the integrated steel mills are bankrupt. So, there are things that have a perfect meaning that even smart people may not be able to do. ”In the field of steel manufacturing, historically steel has been mainly produced in large, integrated steel mills, which produce high-quality steel at reasonable margins. Then, electric mini-steel mills appeared. These mills are able to produce the lowest quality steel at a lower cost. Large steelmakers shrugged off this and continued to focus on producing high-quality steel at (relatively) high margins. Over time, these small electric mill operators gradually mastered the technology to produce higher-quality steel, moved to the higher-end market, and eventually wiped out large integrated steel mills (such as U.S. Steel – once the 16th largest company by market capitalization in the U.S. – was removed from the S&P 500 in 2014).
The analogy with large language models is intuitive. Large labs focus on developing the highest performing models, which are expensive but superior in performance and superior to all others. In order to pay these engineers $900,000 a year, there must be enough profit. At the other end of the market, an open-source community led by Meta and R Localllama is developing high-quality models and exploring how to deliver model services on ultra-low-powered machines. The open-source weight model is expected to reduce costs (on a quality-adjusted basis) while maintaining quality, putting pressure on the margins of large labs. For example, Together has launched a hosted version of Mixtral that is 70% lower than Mistral's own version.
Therefore, we will see a ** market: at the high end, there are more expensive and higher-quality models;And at the low end, there are lower-quality, lower-quality models. For open-source heavyweight models, we can expect their ** to converge on the cost of GPU and electricity, and as the competition in the GPU market intensifies, it may end up being limited to the cost of electricity.
So, what would a potential buyer of these APIs look like?If we were to rank the tasks that a large language model could perform in order of economic value, how many tasks would require a highly complex model?While GPT-4 may have to be used in some cases, it's hard to imagine that this demand threshold will remain the same. The open-source weight model will continue on its steadily rising trend, squeezing profit margins for large labs. Thanks to the emergence of tools that make it easier to switch between different model APIs, developers using APIs will choose the model with the lowest cost to complete their tasks. For example, when using a large language model to complete long and short completions, do you really need the largest, most advanced model?Probably not!
In addition, companies that have had great success in the consumer market may end up being reluctant to pay large profits to model companies and start training their own models. We're seeing early companies like Harvey and Cursor that were first exposed to GPT-4 start recruiting research scientists and engineers to stock up on the talent needed to train their own base models. Since API fees are probably the biggest expense for these companies, it's only natural that they will do everything they can to keep costs as low as possible.
If you're developing your own model, you can go out and raise a round of funding to support model development, and this one-time capital expenditure can be used to improve overall profit margins. Google's TPU program is an example of this. By spending billions of dollars on custom chips, they were able to avoid paying NVIDIA's expensive fees.
Therefore, the conclusion is:As long as the task is simple enough to be solved by open-source weight models, we see the large language model API market trending towards the lowest cost. If the task is very complex and requires the best model, then you will have to pay OpenAI. For others, the fine-tuned Mistral 7B is a good choice.
Original Author: Finbarr Timbers Original Address: