Author: Yu Yang.
Editor: Peach.
With the hype of SORA, OpenAI's introduction materials call SORA a World Simulator, and the word world model has come into view again, but there are few articles to introduce the world model.
Here's a review of what a world model is, and a discussion of whether or not Sora is a world simulator.
What is a world model?
When the word "world" and "environment" are used in the field of AI, it is usually used to distinguish them from agents.
One is reinforcement learning and the other is robotics.
Therefore, it can be seen that world models and world modeling are the earliest and most often appear in the field of robotics.
Perhaps the most influential of the word "world models" today is Jurgen's 2018 article titled "World Models" by Arxiv, which was eventually published in Neurips'18 under the title of Recurrent World Models Facilitate Policy Evolution.
It does not define what world models are, but rather makes an analogy to the mental model of the human brain in cognitive science, citing the 1971 literature.
A mental model is a mirror image of the human brain to the world around it.
The mental model described in Wikipedia makes it clear that it may be involved in cognitive, reasoning, and decision-making processes. When it comes to mental model, there are two main parts: mental representations and mental simulation.
an internal representation of external reality, hypothesized to play a major role in cognition, reasoning and decision-**the term was coined by kenneth craik in 1943 who suggested that the mind constructs "small-scale models" of reality that it uses to anticipate events.It's still cloudy to say so far, so the structural diagram in ** clearly explains what a world model is.
In the figure, the longitudinal V->Z is the low-dimensional representation of the observation, which is realized by VAE, and the horizontal M->H->M->H is the representation of the next moment of the sequence, which is realized by RNN, and these two parts add up to the World Model.
In other words, the world model mainly includes state representation and transition models, which also correspond to mental representations and mental simulation.
Seeing the picture above, you may think, isn't all sequences ** world model?
In fact, students who are familiar with reinforcement learning can see at a glance that the structure of this graph is wrong (incomplete), and the real structure is the following diagram, the input of RNN is not only Z, but also the action action, which is not the usual sequence ** (will it be very different to add an action?). Yes, adding actions can make the data distribution freely changeable, which is a huge challenge).
Jurgen's article belongs to the field of reinforcement learning.
So, aren't there many model-based rls in reinforcement learning, and what is the difference between the model and the world model? The answer is that there is no difference, just the same thing. Jurgen started with a paragraph.
The basic meaning is that no matter how many model-based RL jobs there are, I am a pioneer of RNN, and I invented RNN to make models, and I just want to do it.
In an earlier version of Jurgen's article, it was also mentioned that many model-based rl, although they learned the model, did not fully train the rl in the model.
The fact that RL is not fully trained in the model is not actually any difference between the models of model-based RL, but the long-term helplessness of the direction of model-based RL: the model is not accurate enough, and the RL trained completely in the model is very poor. This problem has not been resolved until recent years.
The clever Sutton realized a long time ago that the model was not accurate enough. In 1990, he proposed the **Integrated Architectures for Learning, Planning and Reacting Based on Dynamic Programming) of the Dyna framework (published in ICML, which was first changed from workshop to conference), and called this model an action model, emphasizing** The result of the action.
RL learns from real data (line 3) and model at the same time (line 5) to prevent inaccurate model from learning the strategy poorly.
As you can see, the world model is very important for decision-making. If you can get an accurate world model, you can find the best decision in reality through trial and error in the world model.
This is the core role of the world model: counterfactual reasoning, that is, the result of a decision can be inferred in the world model, even for decisions that have not been seen in the data.
Students who understand causal reasoning will be familiar with the word counterfactual reasoning, and a causal ladder is drawn in Turing Award winner Judea Pearl's popular science book The Book of Why, and the lowest level is correlation, which is what most of today's ** models are mainly doing; The middle layer is the intervention, and the exploration in reinforcement learning is a typical intervention; At the top level is counterfactualism, answering the what if question through imagination. Judea's diagram for counterfactual reasoning is what the scientist imagines in his brain, which is similar to the diagram used by Jurgen in **.
Left: Schematic diagram of the world model in Jurgen. Right: The ladder of cause and effect in Judea's book.
At this point, we can conclude that the pursuit of the world model by AI researchers is the pursuit of the ability to go beyond the data, conduct counterfactual reasoning, and answer what if questions. This is a capability that humans naturally possess, and current AI is not doing very well. Once a breakthrough occurs, AI decision-making capabilities will be greatly improved, enabling scenarios such as fully autonomous driving.
Sora is not a world simulator
The word simulator is more used in the field of engineering and works just like world models, trying out high-cost, high-risk trial-and-error that is difficult to implement in the real world. OpenAI seems to want to reassemble a phrase, but the meaning remains the same.
The ** generated by the sora can only be guided by vague prompt words, and it is difficult to accurately control. Therefore, it is more of a tool, and it is difficult to use it as a tool for counterfactual reasoning to accurately answer what if questions.
It's even hard to evaluate how strong the sora is because it's not at all clear how different the demo's ** is from the training data.
What's even more disappointing is that these demos show that Sora doesn't learn the exact laws of physics. I have seen some people point out that the sora generation does not conform to the laws of physics [ openai released the Wensheng ** model sora, AI can understand the physical world in motion, is this a world model?What does that mean? ]
I guess that OpenAI's release of these demos should be based on very sufficient training data, even including CG-generated data. However, even so, the laws of physics that can be described in equations with a few variables are not mastered.
OpenAI believes that SORA has proved a route to Simulators of the Physical World, but seemingly simple stacking of data is not the path to more advanced intelligence technologies.