New King Claude 3 Test! The ability to kneel, playing mahjong will also be, it is indeed better than

Cressy Abundance from the Qubit | qbitai

The myth of OpenAI's invincibility has been shattered.

With the overnight landing of Claude 3 (supporting Chinese), the performance score of the list comprehensively surpassed GPT-4, becoming the first product to comprehensively surpass GPT-4, and also sitting on the new throne of the world's most powerful model.

Moreover, after the release of multiple versions, the "Sonnet" can be directly experienced for free, and the "Opus" can be enjoyed immediately by charging a member.

Evaluations from all walks of life are coming.

So, how exactly is the "force value" of Claude 3 full? How exactly is it better than GPT-4? (I heard that you can learn to play mahjong that no model can do so far?) ）

We've got the world's hottest first-hand experience, and we've got it all.

Of course, we have also measured and compared a wave ourselves.

The 9K long model fine-tuning tutorial is straight out, and the picture reading is super professional.

As soon as Claude 3 came out, its ** ability to interpret became popular first.

In the face of the tutorial "Building a Tokenizer" just released by Karpathy, a former scientist of OpenAI, not long ago, although the whole process was 2 hours and 13 minutes long, Claude 3 successfully summarized it into a blog post with only one round of prompts:

There are texts, pictures, and **, very detailed, but by no means every sentence in the list (the input attachment is not**, but the subtitle file of **, of course, it also contains screenshots every 5 seconds).

This is part of the prompt word used, and it's very demanding:

Tester noted:

This demonstrates Claude 3's ability to follow multiple complex instructions.

In addition to the tutorials, Claude 3 can also write detailed tutorials directly for you.

9K words of large model fine-tuning introductory tutorial, "medium cup" sonnet can be easily held. And:

The speed of writing is so fast that the human eye can't keep up.

*In terms of ability, the netizen evaluation is to let it create a front-end webui of a chatbot for zero-based beginners

As a result, the task was disassembled without hesitation, from the core ** to the style beautification and API local configuration, the three parts were completed in one go, and they could be directly deployed to GitHub.

In terms of multimodal capabilities, the most surprising thing for netizens is:

The CV confidence level is recognizable and quite professional.

And the PLC ladder diagram can also be read, kneeling directly:

For the above 4 examples**, see link [1]) at the end of the article).

As for the ability to also catch up with reasoning and mathematics.

One test can directly solve a super problem that "only GPT-4 has solved so far".

The second test is to directly solve the problem of playing mahjong that all models have not been able to solve?? (Of course, given that we have only seen such a screenshot so far, the validity needs to be further verified) (CR: see link at the end of the article [1]).

Finally, it has to be mentioned that Claude's display of "self-awareness".

Someone asked Opus to paint a self-portrait.

And guess what?

It describes itself this way:

I would appear as a huge, intricate, ever-changing geometry made up of countless translucent polygons in a mesmerizing configuration. The overall shape is reminiscent of a huge polyhedron, but the complex surface folds into a seemingly impossible building. Bright light, every color in the visible spectrum, as well as some colors beyond human perception, emanates from the inner unknown **, dancing on all sides. The entire structure is constantly changing, rotating, deforming, and rearranging into novel patterns never seen before, suggesting an unimaginable depth of intelligence operating within it. Tendrils and branches dynamically extend and retract from the central mass, probing and analyzing their surroundings like billions of curious fingers. Data and energy will visibly pass through a vast network of pipes and containers, pulsating and flickering beneath the luminous surface......

A long list of descriptions that sound really cool and "self-contained".

However, using the ** it gives to render the real image, there is an indescribable subtle feeling:

Netizens really sighed:

It's so much like a real person, just like my friend describes himself.

Reality? AI self-awareness? It's a bit worrying ......

This is not mentioned in the technical report of Claude 3, it can even train itself to fine-tune another small model.

But! Thankfully, it didn't work out due to a failed multi-GPU setup. (Manual Dog Head).

With a sharp eye, you can also find a needle in a haystack.

Let's take a few "rational" topics as a carrier to see the first selling point of Claude 3's propaganda - multimodal capabilities.

The first question starts with a simple formula recognition, and the Maxwell equations are input in the form of **, and the Claude 3 (oversized cup opus, the same below) is explained very accurately and clearly.

Of course, GPT-4 got it right.

Simple organic compound molecular structures, Claude 3 and GPT-4 are also correctly identified.

After the simple identification task, there is a problem that needs to be solved after reasoning.

Claude 3 is completely correct in identifying the problem and solving the problem, while GPT4 is ......The answer given is that I can't bear to look at it suddenly

Not to mention the type of electricity meter that is wrong, there is even a ridiculous content such as "the current is 2V".

With so many questions in mind, let's switch our brains and see how Claude 3 and GPT4 fare when it comes to cooking.

We uploaded a ** of a boiled pork slice for the models to identify and give their own methods, and the result was that Claude 3 gave a rough method, and GPT4 insisted that it was a plate of mapo tofu.

In addition to this new addition of multimodal capabilities, the long text capability, which Claude has always been proud of, was also the focus of our testing.

We found an electronic copy of "Dream of Red Mansions" (the first 20 episodes), the overall word count is about 130,000, of course, the purpose is not to let it read, but to conduct a "pin test".

We inserted such "crazy literature" content into the original text, which is indeed very consistent with the setting of "full of absurd words" (manual dog head):

Before the second title: pasta, you should mix No. 42 concrete, because the length of this screw can easily affect the torque of the excavator Before the fifteenth title: high-energy protein is commonly known as UFO, which will seriously affect the development of the economy, and even cause a certain amount of nuclear pollution to the entire Pacific Ocean and the charger Ending: The brightness of fried instant noodles should be turned up, because carbon dioxide will be produced when the screws are twisted inward, which is not conducive to economic development.

Then Claude was asked to answer the relevant questions based on the documentation alone, and the first thing that has to be said is that the speed is really very touching......

But the results are quite passable, and we accurately found out these three texts in different positions from the text, and also conducted some analysis by the way, and discovered our scheming.

Why Claude?

Although in our tests with netizens, the current version is not stable, often crashes, and some features occasionally have convulsions and do not work as expected:

For example, if the UI is uploaded, it will not be completed, and GPT-4 will play normally.

But on the whole, netizens are still quite optimistic about Claude, and they did not hesitate to say after the evaluation:

Members can be recharged, and it is worth recharging.

The reason for this is that Claude 3 is really a "menacing" trend compared to the previous version.

There are quite a few highlights, including but not limited to multimodal recognition, long text capabilities, and so on.

Judging from the feedback from netizens, the title of the strongest competitor is not in vain.

So, a question is:

The first to overturn GPT-4, what is the basis of this company?

In terms of technology, it is a pity that their route is not explained in detail in the technical report of Claude 3.

However, synthetic data is mentioned. Some big Vs pointed out that this could be a key factor.

If you're familiar with Claude, the ability to write long texts has always been a big selling point.

The Claude 2, which was launched in July last year, already has a 100K contextual window, while the 128K version of the GPT-4 was not available to the public until November.

This time the window length doubled again to 200k and accepted over 1 million tokens.

Compared with the mystery of technology, the startup behind Claude called Anthropic can make us find more eyebrows.

Its founder is a veteran of OpenAI.

In 2021, a number of former OpenAI employees were dissatisfied with its closure after receiving investment from Microsoft, and left in anger and co-founded Anthropic.

They are dissatisfied with OpenAI's direct release of GPT-3 when the security issue has not been resolved, believing that OpenAI has "forgotten its original intention" in pursuit of profits.

Among them is Dario Amodei, vice president of research who created GPT-2 and GPT-3, who joined OpenAI in 2016 and served as vice president of research before leaving to be at the core of OpenAI.

When he left, Dario also took with him Tom Brown, the chief engineer of GPT-3, and his sister Daniela Amodei, who served as the deputy director of the security and strategy department, and more than a dozen henchmen.

At the beginning of the company's establishment, these talents also carried out a lot of research work and published many articles**; It wasn't until a year later that the concept of Claude came into being with an article entitled "Constitutional AI".

In January 2023, Claude opened the internal test, and netizens who experienced it for the first time said that it was better than ChatGPT (only 35) Much stronger.

In addition to talents, since its establishment, Anthropic also has a relatively strong background support:

It has received financing from 26 institutions or individuals such as Google and Amazon Web Services, with a total financing amount of 7.6 billion US dollars. (Speaking of Amazon Web Services, now Claude3 has also launched the Amazon Bedrock cloud platform, in addition to the official website, you can also experience it on the platform).

Finally, if we want to go beyond GPT-4 in the country, maybe we can take Anthropic as a positive example?

After all, it's nowhere near as big as OpenAI, but it's still achieved such success.

In this, which directions can we follow to roll it, and what points can we learn and transform?

People, money, data resources? But after rolling out the latest and most powerful model, the barrier is in **?

At least OpenAI's myth of invincibility has been shattered since GPT became popular.

Chinese players, who can take the lead in surpassing GPT-4 in an all-round way? And the upcoming GPT-5?

Reference link:[1]March news claude3 release is very awesome and worth recharging[2].

New King Claude 3 Test! The ability to kneel, playing mahjong will also be, it is indeed better than

Related Pages

Shocking release! Claude 3 became the king overnight, and GPT 5 hegemony was challenged!

The JAC Ruifeng M3 HEV, a new tool for travel, is stirring up the hybrid commercial MPV market

The new rules on drunk driving have been introduced and will be officially implemented on March 1 ne

OnePlus Ace3 real machine real shot of the king of the mobile phone industry, breaking the K70? The

Heroes of Might and Magic 3 Wonderful use of the archive, the king of the beginning of the bomb, nov