The 2024 LLM competition has officially kicked off, and Anthropic, a large model company known as OpenAI's "strongest competitor", debuted with the Claude 3 series. And its opponents, one is "pulling heads" with Musk; The other is frustrated by gemini's overly correct politics.
For a while, words such as "comprehensively crushing GPT-4" and "the world's strongest model changed hands" were spread all over the screen, and it seemed that the prosperous era opened up by OpenAI a year ago had finally been handed over to Anthropic.
Let's not forget that GPT-4 is already a product of a year ago, and GPT-5 is still on the way; Google is not Anthropic's "opponent", it is the second largest "gold daddy" behind it. In December, Google's investment in Anthropic expanded to $2 billion.
And in this "you chase me" game, there is also the Mistral Large that has been released by Microsoft's "youngest son" Mistral AI, and Llama 3, which is listed as a key task by Meta in 2024, and so on. It's just that both Mistral AI and Meta are currently being pressed and beaten by GPT-4, and only Anthropic, which was "born" from OpenAI, has come up with a "killer weapon" that can fight with it.
This also shows that the temporary rankings are only first-come, first-served, and this year's LLMS will be in full bloom. Therefore, the overwhelming "Open AI being tied the score" is just a gimmick for Anthropic's new product launch, and the focus should be on the commercialization path chosen by the AI companyWhen inflection, characterWhile AI, and even companies like OpenAI, are further exploring the To C consumer use case, Anthropic is headlong on the TO B. This idea is reflected in the outstanding performance and pricing strategy of the just-released Claude 3 series.
The Claude 3 series consists of three models – Opus, Sonnet, and Haiku, ranging in performance from high to low.
According to a technical report published by Anthropic,OPUS outperformed GPT-4 in a series of benchmarks such as the knowledge test MMLU, the reasoning test GPQA, and the basic math test GSM8K. Sonnet's performance is on par with GPT-4; Haiku is slightly inferior to GPT-4. However, the just-updated GPT-4 Turbo and Gemini 1 were not included in this test5 pro。
It is worth noting that MMLU (Undergraduate General Studies), GSM8K (Elementary Mathematics), Humaneval (Computer**) and other indicators are severely saturated, and almost all models behave the same. The real differentiators are MATH (Mathematical Problem Solving Ability) and GPQA (Domain Expert Ability), the latter of which can reflect the model's ability to serve the enterprise.
It is reported that Claude3 has chosen finance, law, medicine, and philosophy as areas of expertise. Among them, OPUS's GPQA accuracy rate has reached 60%, which means that its ability is close to the accuracy rate of human PhDs in the same field and able to access the Internet (65%-75%). sonnet up to 404%;Haiku up to 333%。And GPT-4 is only 357%。
In this regard, Jimfan, a senior AI scientist at NVIDIA, pointed out: I recommend that all LLM model cards should follow this practice, so that different downstream applications can know what to expect.
At the same time, considering that enterprise customers need to deal with a lot of PDFs, PPTs, and flowcharts, the Claude3 series has improved in terms of visual ability, accuracy, long text input, and security.
For example, in terms of accuracy, Anthropic uses a large number of complex factual questions to target known weaknesses in the current model, classifying answers into correct answers, false answers (hallucinations), admitting "not knowing". Correspondingly, claude3 can indicate that he does not know the answer instead of providing incorrect information. In addition to more accurate responses, Claude 3 can even "quote", pointing to precise sentences in the reference material to verify their answers.
In terms of pricing strategy, 40 USD for GPT-4 Turbo 1M tokens; gpt-3.5 Turbo's $2 1M tokens are for comparison.
The most powerful OPUS - $90 1M tokens, suitable for the most cutting-edge enterprises and institutions. Its near-human comprehension makes it suitable for scenarios that require highly intelligent and complex task handling, such as enterprise automation, market analysis and strategy development, complex data analysis and finance**, biomedical research and development, and more.
The most cost-effective SONNET - $18 1M tokens, suitable for most enterprise customers to use on a large scale, and consumer customers can also afford it. Its plain text task performance is comparable to that of OPUS, and it is more suitable for medium complexity work such as data processing, generation, personalized marketing, and parsing.
The fastest speed haiku - 1$5 1M tokens, suitable for consumer customers. It is responsive in near instantaneity, performs well on most text-only tasks, and includes multimodal capabilities (such as visual) for real-time user interactions, content management, logistics inventory management, text translation, and more.
Overall, the high-end line OPUS of the Claude 3 is more expensive than the OpenAI (GPT-4 Turbo), and the low-end line Haiku is more expensive than the OpenAI (GPT-3.).5 turbo) is cheaper.
As a result, success or failure seems to be focused on the mid-tier SONNET. If "less illusion", "more professional field experts", and "better cost performance" are more attractive to corporate customers. Well, the position of the GPT-4 Turbo will not be awkward until the situation is broken by GPT-5.
Currently, users can try the medium-performance Sonnet for free, the most powerful version of Opus is only available to Claude Pro paid users ($20 per month), and the slightly less powerful Haiku is coming soon.
PS: Shidao uses the same prompt on the PoE, and makes Opus, Sonnet, and GPT-4 Turbo randomly do a simple news translation. As a result, sonnet performed the best, and even translated the abbreviation! The OPUS level is huge, and GPT-4 directly ignores the subject...
In short, this set of combo punches down, as Anthropic co-founders Amodei siblings said:"Anthropic is more of a corporate company than a consumer company. ”
Currently, Claude's clients include technology companies Gitlab, Notion, Quora, and Salesforce (investors in Anthropic); Financial giant Bridgewater and conglomerate SAP, as well as business research portal Lexisnexis, telecommunications company SK Telecom and Dana-Farber Cancer Institute.
According to a statement from Anthropic executive Eric Pelz: Among early beta users of Claude 3, productivity software maker Asana saw a 42% reduction in initial response time; Software company AirTable said it has integrated Claude 3 Sonnet into its own AI tools to help speed up content creation and data aggregation.
It is foreseeable that after the release of Claude 3, Anthropic's path to B commercialization will be clearer, and it will take a different path from the leading large model companies such as OpenAI, although it may end up on the same path.
"Earn more, spend more" is a true portrayal of the head large model company. In fact, Anthropic's path to B is both a voluntary choice and a necessity of the situation.
As of December 2023, OpenAI's ARR has exceeded $1.6 billion, and the ARR in 2022 is $30 million, an astonishingly high growth rate.
Although there is no data on Anthropic's 2023 ARR, in October 2023, Anthropic said that by the end of 2023, it will achieve $200 million ARR, with a monthly revenue of nearly $17 million. Also, according to Anthropic's latest**, by the end of 2024, its ARR will reach at least 8$500 million.
Indeed, thanks to rapid revenue growth, Anthropic's billions of dollars were raised in 2023 at a valuation of more than $15 billion.
But according to INFORMATION, two people familiar with the matter revealed that after paying for customer support and AI server costs, Anthropic's gross margin in December 2023 was 50%-55%, which is much lower than the average gross margin of 77% for cloud software companies, according to Meritech Capital.
Another significant shareholder** is that Anthropic's long-term gross margin will be around 60%, and this gross margin does not reflect the server costs for training AI models, as these costs are included in Anthropic's R&D expenses.
According to Sam Altman, each model can cost up to $100 million. However, Altman himself can't laugh, because OpenAI's gross margin may be lower. After all, ChatGPT also has a free version, which costs a batch of servers in vain.
All of the above facts show that even if you are as good as OpenAI or Anthropic, the general profit margins of AI startups may be lower than those of current SaaS companies.
However, the problem has not yet appeared, after all, the large model is in the wind, and investors are more concerned about its amazing growth rate. These AI startups will also raise capital based on their optimistic revenues** at a valuation of 50-100 times their revenue over the next year.
Of course, as long as AI startups can maintain this growth momentum, investors can ignore losses. Until, your income growth drops into 30%-40%. A VC partner said that at that point, if a company has negative operating cash flow and does not convert at least 10% of its revenue into cash flow in the short term, it will be difficult to attract new investors.
According to Meritech Capital, the median of publicly traded software companies is 6 times their future revenue. That said, it will become increasingly difficult for startups to maintain such revenue multiples over time.
In the case of Anthropic and OpenAI, the growth and margins of both companies are partly dependent on major cloud service providers.
Google and Amazon, for example, have poured billions of dollars into Anthropic's software and given Anthropic's software to their cloud customers. It's unclear what percentage of the sales these cloud vendors will receive, but if it were to switch to Anthropic selling the model directly to customers, the profit margins could be higher.
Although Microsoft leases OpenAI's cloud servers at a lower profit, OpenAI must return part of the revenue from direct sales to customers to Microsoft. And, when Microsoft sells OpenAI software to its own cloud customers, it also siphons out most of the revenue.
Therefore, for the above-mentioned "unjust" AI startups, they want to get high gross profits. On the one hand, by updating technology, the operating cost will be reduced, just like OpenAI has achieved; On the other hand, like Anthropic's "Tian Ji Horse Racing" strategy, identify the cuts, focus on corporate customers, generate revenue as much as possible, and maintain a high growth rate.
According to Forbes, Anthropic recently raised $7$500 million, the company plans to add features such as interpretation, search functionality, and source citations in the coming months. "We will continue to scale our models and make them smarter, while also continuing to work to make smaller, cheaper models smarter and more efficient," said its founders, the Amodei siblings. There will be varying degrees of updates throughout the year. ”
In order to better understand Anthropic's commercialization roadmap, Seeway has excerpted the "commercialization" section of an interview with founder Dario Amodei, which was originally written below.
dwarkesh patel :Do you think the current AI products will have enough time to generate long-term stable revenue in the market? Or could it be replaced by a more advanced model at any time? Or will the whole industry landscape become completely different?
dario amodei :It depends on the definition of "massive". At present, there are several companies with annual revenues of 100 million to 1 billion US dollars, but it is difficult to reach tens of billions or even trillions per year. Because it also depends on a lot of uncertainties. Some companies are now adopting innovative AI "at scale", but that doesn't mean it's the best way to get started. Moreover, even if there is income, it is not completely equivalent to creating economic value, and the coordinated development of the entire industrial chain is a long-term process.
dwarkesh patel :From an Anthropic perspective, if LLMs are progressing so quickly, then theoretically the company's valuation should grow very quickly?
dario amodei :Even if we focus on model safety research rather than direct commercialization, it is clear in practice that the level of technology is increasing exponentially. For companies that see commercialization as their primary goal, progress is certainly faster than ours. (XSWL in the connotation of OpenAI).
Although we admit that LLM is making rapid progress, compared with the in-depth application process of the entire economic system, the accumulation of technology is still at a low starting point. The future is determined by the race between the two – the speed at which technology itself advances, is effectively integrated and applied, and enters the real economy. Both are likely to move at a high speed, but the order of combination and small differences can lead to quite different results.
dwarkesh patel :Tech giants could invest up to $10 billion in model training over the next 2-3 years, but what impact does that mean for Anthropic?
Dario Amodei: Scenario 1 – If we can't stay on the cutting edge because of cost, then we're not going to stick to the development of state-of-the-art models. Instead, we look at how we can extract value from previous generations of models.
Scenario 2 – Accept being checked. I think the positive impact of these situations may be greater than they seem.
Scenario 3 – When model training reaches this magnitude, new risks may begin to emerge, such as AI abuse.
To sum up, while Dario is convinced that the capabilities of LLMs will improve rapidly and significantly, they may be constrained by social factors and the efficiency of innovation adoption, which will eventually slow down the speed of their "large-scale" adoption and fail to realize the true potential of LLMs.
Based on this, Anthropic's path to B also looks more secure. On the one hand, it uses its own "security" strengths to cut into the fields of finance, law, and medicine; On the other hand, look for enterprise customers who focus on "technology application" and can cooperate for a long time, so as to eliminate the uncertainty of C-end consumer adoption as much as possible.
Speaking of which, we can be bold and say that if it is true that "the speed of social adoption is lower than the speed of model development", then will there be a group of large model companies in the future to do applications in person? Especially at home.