OpenAI engineers must have the classic Bitter Lesson , which turned out to have a prototype more th

Mondo Social Updated on 2024-02-23

Reported by the Heart of the Machine.

Editor: Du Wei

The ability to learn from a large amount of data has finally surpassed people's imagination.
It's been a week since OpenAI launched the generative model SORA, and the popularity has not decreased, and the author team continues to release eye-catching**. For example, the movie trailer for a group of adventurous puppies exploring the ruins of the sky , Sora generates it all at once and does the editing herself.

Of course, the vivid and realistic AI ** makes people wonder why OpenAI was the first to build SORA and be able to run through all the AGI technology stacks. This issue has sparked a lively discussion on social **.

Among them, in a Zhihu article, the University of California, Berkeley, Ph.D. in computer science, author @siyZ analyzes some of OpenAI's successful squares, and he argues that OpenAI's square is the way to AGI, and that this square is built on several important axioms, including The Bitter Lesson, Scaling Law, and Emerging Properties.

Zhihu original post: The bitter lesson comes from a classic article "Bitter Lesson" by machine learning pioneer Rich Sutton in 2019, through the detours that artificial intelligence has taken in recent decades, and the core point he threw out is: If artificial intelligence wants to be improved in the long run, the use of powerful computing power is the king. The computing power here implies a large amount of training data and large models.

Original link: Therefore, the author @siyZ believes that in a sense, the general AI algorithm supported by powerful computing power is the king of the AGI path and the direction of real progress in AI technology. With a large model, a large computing power, and big data, The Bitter Lesson constitutes a necessary condition for AGI. Coupled with the sufficient condition of scaling law, the algorithm can make the large model, large computing power and big data get better results.

Coincidentally, OpenAI researcher Jason Wei, who went viral this week, also mentioned Rich Sutton's The Bitter Lesson in his daily work schedule. It can be seen that many people in the industry regard The Bitter Lesson as a guideline.

At the same time, in another discussion about whether large language models (LLMs) can act as validators of their own results, it was argued that LLMs are simply not accurate enough to validate their own results and will lead to worse performance (at a cost to the API).

Another Twitter user made an important discovery about this point of view in a blog post by Rich Sutton more than 20 years ago.

Original link: This is what the blog says:

Considering any AI system and the knowledge it holds, it could be an expert system or a large database like CYC. Or it could be a robot that is familiar with the layout of a building, or understands how to react in a variety of situations. In all these cases, we can ask if the AI system can validate its own knowledge, or if it requires human intervention to detect errors and unforeseen interactions, and correct them. In the latter case, we will never be able to build a truly vast knowledge system. They are always fragile and unreliable, and the scale is limited to what one can monitor and understand.
Unexpectedly, Rich Sutton replied, saying that this half-written blog post was the prototype of The Bitter Lesson.

In fact, not long after OpenAI released SORA, many people realized that The Bitter Lesson played an important role.

Others see The Bitter Lesson alongside Transformer **attention is all you need.

At the end of the article, we review the full text of Rich Sutton's Bitter Lessons.

The 70-year history of AI research has taught us that the general approach of harnessing computing power is ultimately the most effective. This is explained by Moore's Law, or its generalization of the continuous exponential decline in the cost per unit of calculation. Much AI research is conducted with the assumption that the computation available to agents is constant (in which case leveraging human knowledge is the only way to improve performance), however, we will inevitably need a lot of computation on a slightly longer timescale than typical research projects.

To improve in the short term, researchers need to draw on specialized areas of human knowledge. But if you want to improve in the long run, using computing power is king. The two are not supposed to be opposed, but they often are. Spend time researching one, and you ignore the other. Methods of harnessing human knowledge are easily complex, making them less suitable for methods that utilize computation. There are many examples of AI researchers recognizing these lessons too late, so it's worth recalling some salient examples.

In computer chess, the method of defeating world champion Kasparov in 1997 was based on a lot of deep searching. At the time, most AI computer chess researchers discovered this with dismay, and their approach was to exploit the understanding of the special structure of human object chess. When this simpler, search-based approach using hardware and software proved to be more effective, these chess researchers, based on human knowledge, refused to admit defeat. They argue that while this brute force search method wins this time, it is not a universal strategy, and it is not a human way to play chess anyway. These researchers hoped that the approach based on human input would win, but the results were disappointing.

There is a similar pattern of research progress in computer Go, only 20 years later. Initially researchers tried to use human knowledge or the peculiarities of the game to avoid searching, but all efforts proved to be useless because search was effectively applied on a large scale. It is also important to use self play to learn a value function (as in many other games and even chess, although it did little in the first win over the world champion in 1997). Learning through self-play and learning in general is a bit like searching because it puts a lot of computation to work. Search and learning are two of the most important technologies in AI research that utilize large amounts of computation. In computer Go, as in computer chess, researchers initially wanted to achieve their goals through human understanding (so that there was not much searching), and it was only later through search and learning that they achieved great success.

In the field of speech recognition, there was a contest sponsored by DARPA back in the 70s of the last century. Contestants make use of a number of special ways to harness human knowledge: words, factors, and human voices, among others. On the other hand, there are also people who take advantage of new methods based on hidden Markov models, which are more statistical in nature and more computationally intensive. Similarly, statistical methods trump methods based on human knowledge. This has led to significant changes in the field of natural language processing, where statistics and computation have gradually become dominant over the past few decades. The recent rise of deep learning in speech recognition is the latest step in this direction.

Deep learning methods rely less on human knowledge, use more computation, and are accompanied by learning from a large training set, resulting in better speech recognition systems. Just like in a game, researchers are always trying to make the system work the way they think – they try to put knowledge in the system – but it turns out that the end result is often counterproductive and a huge waste of the researcher's time. But with Moore's Law, researchers can do a lot of calculations and find a way to use them effectively.

A similar pattern exists in the field of computer vision. Early approaches thought that vision was to search for edges, generalized cylinders, or depending on SIFT features. But today, all these methods are abandoned. Modern deep learning neural networks can achieve better results using only the concepts of convolution and some invariance.

This is a very big lesson. Because we're still making the same kinds of mistakes, we still don't have a thorough understanding of the field of AI. To see this and effectively avoid repeating the mistakes of the past, we must understand why these mistakes lead us astray. We must learn the hard lesson that sticking to our mindset won't work in the long run. The painful lesson is based on the following historical observations:

AI researchers often try to construct knowledge in their own agents, which is often helpful and satisfying in the short term, but in the long run it can stall or even inhibit further development, and breakthroughs may eventually lead to the opposite approach – based on search and learning based on large-scale computing. Success in the end often comes with a hint of bitterness and can't be fully digested because that success isn't achieved through a likable, human-centered approach.

One thing we should learn from the hard way is that general-purpose methods are very powerful, and they will continue to scale as computing power increases, even when the available computation becomes very large. Search and learn seem to be just two ways to scale at will in this way.

Richard S., the Godfather of Reinforcement LearningSutton is currently a professor at the University of Alberta, Canada.

The second common point we learn from the painful lesson is that the actual content of consciousness is extremely complex; We should not try to think about the content of consciousness in simple ways, such as thinking about space, objects, multiple agents, or symmetry. All of them are part of an arbitrary, inherently complex external world.

They should not be inherentized, the reason for this is that complexity is endless; Instead, we should only build meta-methods that can find and capture this arbitrary complexity. The key to these methods is that they are able to find good approximations, but the search for them should be done by our methods, not by ourselves.

We want AI agents to discover new things like we do, rather than rediscover what we find. Building on what we've discovered only makes it harder to see how well the discovery process is complete.

Related Pages