Visual China.
Text: Understand finance and economics.If the large model is Thor's hammer, then where are the nails that fit into it? Over the past year, everyone has been desperately searching for an answer to this question. Humanoid robots are one of the few consensus among all technology companies.
Recently, according to foreign media reports, Microsoft and OpenAI are negotiating to participate in a new round of financing for humanoid robot company FIGURE, and the amount of this round of financing may reach up to $500 million.
This is the second humanoid robot company that OpenAI has invested in. This is not an isolated case. Almost all the people and companies that lay out the large models have a strong passion for humanoid robots and have invested in research resources. From Nvidia to Google, from Amazon to Meta, and even Musk.
As a high-tech technology that has experienced several ups and downs, the market's enthusiasm for humanoid robots has been completely "ignited" by AI large models. As many people expect, humanoid robots are indispensable and important hardware carriers on the road to generative AI through AGI (artificial general intelligence).
Humanoid robots are returning to the familiar spotlight.
This investment is not the first time that OpenAI has invested in a humanoid robot company. In March last year, 1X Technologies (abbreviation: 1X), a humanoid robot startup from Norway, received a round of $23.5 million in funding led by OpenAI Ventures**, becoming the first hardware company to be invested by OpenAI.
The full name of OpenAI's start-up** is "OpenAI Startup Fund", and the investment direction is more focused on the direction of taking the lead in realizing the implementation of AI technology. This is also an important logic for OpenAI's investment in 1X. An OpenAI executive has publicly stated that "the timing of investing in 1X is that its robot hardware has matured and can open up a wider labor market with the blessing of AI." ”
Now, the robot track has been added again, which shows that OpenAI is highly optimistic about the humanoid robot track. As a leading startup in the field of humanoid robots, both 1X and Figure have launched products for humanoid robots.
Among them, 1X has two robot products, the work robot EVE and the domestic bipedal robot NEO, which is under development. Figure also released the first general-purpose humanoid robot Figure 01 last year, and now Figure has reached an agreement with BMW to deploy "general-purpose humanoid robots" in the field of automobile manufacturing.
In fact, OpenAI's enthusiasm for humanoid robots has been around for a long time, higher than everyone thinks, and even OpenAI once set up a robotics department in-house.
As early as 2017, OpenAI's robotics team released RoboSchool, an open-source software for simulating robot control. In 2018, the robotic arm developed by the team has been able to freely "plate walnuts", that is, flexibly play with wooden blocks. Of course, the most famous research of the OpenAI robot team is the manipulator it developed, which can twist the Rubik's Cube with one hand.
Of course, OpenAI is not the only large-scale model company that has a soft spot for humanoid robots. An interesting phenomenon is that almost every company that has made achievements in the field of large models has more or less laid out humanoid robots.
Take Google as an example, in October last year, DeepMind released the RT-X robot model and opened the training dataset Open X-Empodiment. In January this year, the Google DeepMind team and the Chinese team at Stanford University jointly developed a general-purpose robot Mobile Aloha that can stir-fry and do housework.
As for Musk, not to mention, a company called X. was formed in July last yearAl's artificial intelligence company, at the end of this year, Tesla launched the humanoid robot Optimus Prime, which is expected to be delivered next year.
In the process of landing the AI model, what role does the humanoid robot play and can be so favored?
Before talking about the relationship between large models and humanoid robots, we need to have a cognition of what humanoid robots are.
To be clear, robots are not a new thing. Before humanoid robots came out, industrial robots were already a market of more than 50 billion, and were widely used in automotive, 3C, textile, packaging and other industries.
However, industrial robots also have obvious drawbacks, that is, lack of versatility. Traditional industrial robots cannot be used directly, and need to be integrated by system integrators. System integration of robots requires not only a high degree of customization combined with an understanding of the customer's process, but also relies heavily on the experience of the engineer. This is destined for industrial robots to be suitable for large-scale, repetitive production work.
Industrial robots are more like automation equipment than robots. At this point, humanoid robots are undoubtedly closer to our understanding of robots.
Humanoid robots, as the name suggests, are robots that are closer to humans in form. However, the external form is only the appearance, and its core is characterized by the intelligence and versatility of the robot. The reason why we chose "humanoid" is only because the world in which we move is created by the human body, and the human form can operate all tools and has the widest range of adaptability.
From the physical dimension, the humanoid robot is composed of three modules, namely "limbs", "cerebellum" and "brain", of which the "limbs" are composed of a series of hardware such as dexterous hands and sensors, the "cerebellum" is responsible for motor control, and the "brain" dominates the robot's environmental perception, reasoning and decision-making and language interaction.
The emergence of large models introduces semantic understanding and reasoning generation capabilities, which is equivalent to changing the "brain" of humanoid robots. The improvement of humanoid robots by large models is mainly reflected in two aspects: perception ability and thinking and decision-making ability.
First of all, the strong fitting ability of the large model makes it possible for humanoid robots to achieve higher accuracy when performing tasks such as target recognition, obstacle avoidance, 3D reconstruction, and semantic segmentation. For example, AI can now recognize obstacles, but if someone on the side of the road holds up a sign saying that the bridge ahead is broken, please take a detour. In the past, it was difficult for AI to understand this situation, but it is possible for humanoid robots based on large models to recognize and understand this information.
The ability to think and make decisions means that humanoid robots have good knowledge completeness, and can disassemble an instruction into multiple subtasks and sub-instructions to complete various tasks in different scenarios. Let's say you say "heat up" to a robot, and it knows it's going to go for a microwave.
With the in-depth exploration of large models + robots by companies such as Google and NVIDIA, the above positive effects have also been confirmed. In December 2022, Google released the RT-1 model. This is an end-to-end model based on robot data, with a short image sequence and a task described in text at the input end, and action instructions at the output end, including 7 dimensions of arm action instructions, 3 dimensions of basic movement instructions, and 1 dimension of state switching instructions.
The researchers had the robot perform more than 700 tasks, and the results showed that the robot with the RT-1 model showed a higher success rate in previously seen scenes, previously unseen scenes, disturbed scenes, and background change scenes.
It is worth mentioning that this is the first time that a robot has shown generalization through large, diverse, task-independent data, and performed some tasks that have never been seen before.
In July last year, Google released the RT-2, a visual-language-action model. In this model, Google increased the number of parameters from 35m to 55b. The researchers tested the RT-2 model in the same way as the RT-1. The results show that the comprehension ability, reasoning ability, and generalization ability of RT-2 for unknown scenarios are significantly better than those of RT-1 model.
It can be said that AI technology based on large models has made it possible to generalize humanoid robots. As a hardware carrier, humanoid robots have also brought generative AI closer and closer to the goal of AGI.
Although the introduction of large models has "common sense" and brings a certain degree of generalization ability, solving the two major problems of natural language understanding and task planning, more and more companies are releasing humanoid robots, and it seems that all this is getting closer and closer to us. But we must be soberly aware that humanoid robots still have a long way to go before they actually land.
The breakthrough of large models for the intelligence of humanoid robots is gratifying, but it does not solve the problem of robot hardware. The operation ability and movement ability of humanoid robots need to be realized with the help of hardware ontology hardware and algorithms, but in the past 50 years of humanoid robot research, this difficulty has not been overcome.
For example, the sweeping robot can slide forward through the chassis, and the four-legged robot dog can rely on its limbs to maintain stability and balance, but the humanoid robot only has two legs and can only rely on one leg to ensure the stability and balance of the entire body. Large models are limited in their ability to help with physical movement.
Another big challenge for humanoid robotics companies is data. The model training of humanoid robots needs to be supported by a large amount of decision-making data, and if the data is insufficient, the output results will easily drift and the success rate will also be affected.
At present, teleoperation data is an important method for robots to collect data, and the operation logic is to first learn and decompose how people do it, and then correspond to how the robot wants to do it. Because of the use of real-world data, the data quality is the highest, but the cost of collection is also high. For example, Google's bot data for training RT-1 and RT-2 was collected over a period of 17 months on 13 bots. Back then, OpenAI disbanded its robot team, and the difficulty of obtaining and collecting was an important reason.
This is also the reason why large-scale model companies are deploying humanoid robots. With the help of the robot's hardware ontology, AI can have more contact with external information, and this data also has the possibility of feeding back AI algorithms. For example, the data accumulated by Tesla's self-driving cars on the highway feeds the FSD algorithm system and provides the most basic data foundation.
Despite all these problems, it still can't stop more and more large model companies from rushing into the humanoid robot track. Many people call last year the first year of the real industrialization of humanoid robots. But few people remember that in 2016, the wave of humanoid robot craze, the tide receded, and there were very few so-called humanoid robot companies that survived in the end.
When a new technology beyond imagination begins to be commercialized, the initial shock fades quickly. Participating companies need to be tested by the laws of business while driving the technology to mature. The personal computer and Internet industries have withstood such tests, and in turn, have reshaped the whole world. Humanoid robots have broken through several times and stagnated several times, and now with the tuyere of large models, a new batch of companies have reached this stage again.