"In the endless sea of technology, every wave could herald the awakening of a new behemoth. Today, we are witnessing the rise of a new behemoth – Claude 3. "anthropic has just officially announced:Claude 3 is here! As the strongest competitor of OpenAI, the new model family it released this time is represented by the strongest version of the Claude 3 OPUS"Near-human comprehension has been achieved."
In reasoning, math, coding, multilingual comprehension, and vision,Comprehensively surpasses all large models, including GPT-4, the kind that directly re-establishes the benchmark of the industry.
A glance at this list of achievements is eye-catching
Several math evaluations were measured with 0-shot to surpass GPT-4's 4-8 shot.
On top of that, Claude, who has been known for its long windows, offers a 200k contextual window for the full range of large models, and accepts itMore than 1,000,000 tokens entered
gemini 1.5 Pro: Huh?
Currently, you can experience the second strongest Sonnet for free, and the strongest version of Opus is available for Claude Pro paying users, but the large model arena can also be used for free. As a result, netizens have begun to play like crazy. _(doge)_
In addition, OPUS and SONNET are also open API access, so developers can use them immediately.
Someone directly Aite Ultraman: Okay, you can release GPT-5 now.
However, Ultraman may still be bothered by Musk's lawsuit ......
The Claude 3 family consists of three models: the small Haiku, the medium Sonnet and the large Opus, with increasing cost and performance.
First of all, inPerformance parametersThe Claude 3 has been comprehensively improved in many aspects. Among them, OPUS is ahead of all other models in evaluation benchmarks such as MMLU, GPQA, GSM8K, etc
In terms of visual capabilities, it can handle a variety of visual formats, including **, charts, graphs, and technical charts.
For such performance results, some professionals expressed their opinions.
For example, a PhD student at the University of Edinburgh and one of the proposers of the Chinese Large Model Knowledge Assessment Benchmark C-EvalFu YaoIn other words, benchmarks like MMLU GSM8K Humaneval are severely saturated: all models perform the same.
He argues that what really differentiates model performance benchmarks is:math and gpqa
In addition, Claude 3 has taken a big step forward in refusing to answer human questions, and the likelihood of refusing to answer is significantly reduced.
In terms of context and memorization, they used a needle in a haystack (NIAH) to evaluate the ability of large models to accurately recall information from large amounts of data.
The results not only achieved a near-perfect recall rate with more than 99% accuracy. And in some cases, it can even recognize that the "needle" sentence appears to have been artificially inserted into the original text, thus identifying the limitations of the assessment itself.
Advances have also been made in biological knowledge, cyber-related knowledge, etc., but they are still at AI Security Level 2 (ASL-2) for responsible reasons.
Secondly, inResponse time, Claude 3 is drastically shortened to near real-time.
Officially, the upcoming release of a small cup of haiku is available inWithin three secondsRead and understand ARXIV** with a chart of about 10k tokens in length.
And the medium cup sonnet is able to build a higher level of intelligence, faster than the Claude 2 and Claude 21 is 2 times faster, especially good at tasks that require quick response such as knowledge retrieval or automated sales.
The Big Cup Opus has the highest level of intelligence, but the speed is undiminished, with the Claude 2 and Claude 21 approximately.
The official model of the three models also has a clear positioning.
Big Opus: Smarter than other models. Suitable for complex task automation, R&D and strategy development; Medium Cup Sonnet: More affordable than other similar models. More suitable for scale. Ideal for data processing, RAG, saving time in medium-complexity workflows; Small cup haiku: faster and more affordable than similar models. Ideal for real-time interaction with users and cost savings in simple workflows; Inaspect, the cheapest small cup is priced at 0$25 for 1M tokens input, and the most expensive large cup is priced at $75 for 1M tokens input.
Compared to the GPT-4 Turbo, the large cup** is indeed a lot higher, which also shows that Anthropicai is very confident in this model.
In that case, let's try it for free
Now that the official page has been updated, Claude has shown the ability to understand and process images, including recommending style improvements, extracting text from images, converting the UI to a front-end**, understanding complex equations, transcribing handwritten notes, and more.
After the release of Claude 3, netizen @op7418 tried Claude 3 Opus for the first time and did three tests.
Netizens first tested the translation ability of Claude 3 Opus, challenging a complex English text. The results show that Opus's translations are not only well-organized, but also well-segmented and formatted, resulting in a much improved reading experience. However, when it comes to the fluency and accuracy of translation, GPT-4 still has a slight advantage.
In addition, netizens used a screenshot of a complex design draft to test Opus's ability to restore details. After netizens clearly pointed out the need to restore the style, Opus accurately grasped the design elements, and the overall performance was better than GPT-4.
The image multimodal capability is also a key point worth looking at for opus. It can not only read the essence of academic **, but also present the results of the analysis clearly. However, compared to GPT-4, OPUS seems to have some room to grow in terms of information richness.
Netizen @mlpowered provided the API with a two-hour transcript and selected screenshots of key screens, and successfully produced a blog post in HTML format with rich content.
Netizen @7oponaut used Opus and GPT-4 to play tic-tac-toe, but unfortunately Opus could not draw the grid smoothly. , and GPT-4 declared a success.
We also tested some of the effects of Claude 3, such as looking at the picture to identify the recipe.
image to explain the equation.
image to extract the json file.
Even vague chronological documents can be accurately OCR recognized:
It reads: You're using their second most intelligent model, the Claude 3 Sonnet.
However, the goose, probably the reason for too many people, tried several times and all showed“failed”
However, netizens have also popped out some test effects, such as letting Sonnet solve puzzles.
Give it some examples and ask it to find the relationship between numbers, such as "1 dimitris 2 q 3", meaning that 3 is the result of the addition of 1 and 2.
As a result, sonnet successfully solves -11 plus 8 equals 69, so the value of "x" should be 69:
Some netizens found that Sonnet can now also read ASCII codes, and shouted:This is GPT-4++ level
In terms of programming tasks, let's not say who wrote the ** first, Claude 3 at least will not be as lazy as GPT-4.
There are also players who have experienced opus, ** dig a pit for the model, but opus is perfect to dodge and not be fooled:
At first glance, it feels okay. At this time, it should be Aite OpenAI: GPT-5 in **?
Jim Fan, a senior scientist at Nvidia, is already looking forward to the appearance of GPT-5.
Well, if you are interested, you can click the link below to experience
Reference link: 1].