Seven major domestic players face Tesla!Reveal the secret of the intelligent driving model melee

Mondo Culture Updated on 2024-01-29

Edit the search image.

After shouting for a year, where is the smart driving model now?Author | janson

Edit |The slogan of juice large model + autonomous driving has been shouted for a year, and what is the progress now?After the outbreak of large models at the beginning of the year, players in the industry are considering combining large model technology with autonomous driving. Tesla was the first to introduce the end-to-end technical architecture of Transformer + BEV, and then the first to introduce occupancy network technology in the industry. In the interview, the Tsinghua professor bluntly said that Tesla has led the industry for three years.

Domestic OEMs including Huawei, Xiaopeng, Momo Zhixing, Zhijia Technology, and TiRE1 all showed their latest progress at the end of the year. But after a year of development, progress doesn't seem so obvious. Whether it's ads20 or XNGP, neither of them has yet reached the so-called "universal" intelligent driving performance of the intelligent driving model, and each company has more or less limited it to a specific city or a specific scene, and there is still a distance from the "universal" intelligent driving. Compared with the amount of data of text information, the amount of data faced by the intelligent driving track can be said to increase exponentially.

In the field of autonomous driving based on big data, it needs to process data from different sensors, such as lidar, millimeter-wave radar, ultrasonic radar, as well as high-definition cameras and GNSS. These data have different spatiotemporal properties, and it is also necessary to consider the correlation between hardware damage and data reliability.

In addition, large in-vehicle models also require a large amount of scene data, including traffic sign lines, traffic flow and behavior models. This makes the threshold for the development and training of large car models quite high.

Edit the search image.

Tesla's 3D vision mode.

At the same time, from the perspective of the iteration of the visual solution, from the early CNN to BEV, and then to the current mainstream Transformer + BEV, all test the technical accumulation of the R&D team.

This year, companies represented by Tesla moved out of the occupancy network to occupy the network skill tree, once again bringing the difficulty of R&D, R&D investment and the bottom line of technology to a new high.

However, compared with the beginning of the year, the OEMs and solution providers that have shouted "big models on the car" seem to have "died down", no longer relying on simple publicity for the first car to turn to the improvement of usability and reliability. It is not difficult to see that behind the "silence" of the large model of intelligent driving at the end of the year, there is a technical suspension of various manufacturers in the "melee" of the large model. As for the next competition, who can stand out can only speak with technical performance and products. Welfare of this article: Large models empower autonomous driving, where has it been?Share the manual "Domestic car companies adopt large-scale model technology in autonomous driving, and products continue to be iteratively updated. , the dialog box replies to the [car thing 0568] ** report.

Recommend our annual meeting. On December 19, the 2023 Global Autonomous Driving Summit will be held in Shenzhen. The main venue has an opening ceremony, as well as three special sessions: high-end intelligent driving, large model, and computing power. The sub-venues will hold the Shenzhen Nanshan Intelligent Connected Vehicle Government-Enterprise Exchange Meeting, the Autonomous Driving Analyst Forum, and the Autonomous Driving BEV Perception Technology Forum.

Among them, Professor Deng Zhidong of Tsinghua University, Li Hongyang, author of Uniad, Huang Guan, founder of Jijia Technology, Sun Qi, founder & CEO of Shengqi Technology, Yu Xu, founder and CEO of Kaiwang Data, and Yin Wei, senior manager of intelligent driving software of Zhiji Automobile, will bring speeches and discussions on topics such as large visual language models, end-to-end autonomous driving, world models, data closed-loop, automatic co-pilot, and mass production and delivery of large model vehicles. Scan the QR code to register

01.The popularity of large models continues to become a must for intelligent driving players, Xiaopeng, Ideal and other OEMs, as well as Huawei, Momo Zhixing, intelligent driving technology and other solution providers began to switch to the BEV+Transformer technology route this year, and more or less launched some products or solutions implemented in the car, we can try to get a glimpse of the development status of domestic intelligent driving large models from their current progress. From the perspective of technical routes, domestic manufacturers have basically switched the technical route to BEV+Transformer.

Edit the search image.

The technical route of domestic mainstream intelligent driving players.

In terms of application speed, companies using BEV+Transformer use Xpeng's XNGP, the BEV visual perception system XNET, and Huawei's ADS20 is the representative, and basically all products have been trained to implement on-board large models in cars. Among them, Xpeng's XNet can output 4D dynamic information (such as vehicle speed, movement**, etc.) and 3D static information (such as lane line position, etc.) from the BEV perspective, which can better assist the system for decision-making. In addition, Xpeng is in XNET2Part 0 has also begun to gradually introduce occupancy network technology. And Huawei's ADS20 also joins the self-developed GOD network technology based on visual fusion algorithm, relying on sensors such as lidar to make the information obtained by the whole system more sufficient.

Edit the search image.

Xpeng XNGP

Both Ideal and NIO have chosen to join the occupancy network algorithm in the technology of BEV+Transformer algorithm architecture, so the application is slightly slower than the previous two, whether it is the maturity of Ideal for complex traffic environment recognition or NIO's multi-modal neural network large model, the progress of implementation is relatively slow or the application scope is still limited. From the current progress, it is not difficult to see that among the enterprises that choose to join the occupancy network algorithm, the requirements for R&D capabilities and information processing have been raised to a higher level. However, it is undeniable that once the enterprises that choose the OCCUPANCY network algorithm implement the products on the car, it will achieve a "half-generation" lead compared with the enterprises that fully adopt the BEV+Transformer algorithm architecture. The MANA perception architecture of Momo Zhixing, the integrated driving and parking solution of MAXdrive of Nume, and the maxipilot 2 of MaxiEye intelligent driving technology0 is based on BEV+Transformer technology architecture to make a more versatile large-model intelligent driving solution, they can make corresponding adaptations to the pure vision solution and visual fusion solution, to help car companies complete intelligent driving solutions covering different costs.

Edit the search image.

MANA perception architecture.

Although at the end of this year, all companies have more or less come up with solutions or actual products of their intelligent driving models during this period of time. However, the actual number of cars is still not very ideal. In terms of actual coverage, Huawei's urban NCA is currently officially confirmed to be in only six cities, including Shanghai, Guangzhou, Shenzhen, Chongqing, Hangzhou, and Beijing. Although Huawei once claimed that the NCA in urban areas will be available nationwide by the end of the year at the launch of the new M7, there is still a big gap between the goal and the realization of this goal. Xpeng Motors has made great progress in urban NGP, and has pushed urban NGP functions in 25 cities across the country, becoming the largest number of urban intelligent driving cities in China. However, it should be noted that in the list of 25 new cities, some cities such as Changshu, Taicang, and Kunshan belong to Suzhou City in terms of administrative divisions, so it can be said that the smallest unit of Xpeng's second batch of Kaicheng is a county-level city, which has shrunk slightly in scale. Li Auto has experienced a change in goals, from the initial urban NOA navigation assisted driving, to the commuting NOA, and then to the full-scene intelligent driving NOA, the target repeatedly jumps horizontally. At present, the "official version" of Li Auto's full-scene intelligent driving NOA in December will cover high-speed and ring roads across the country and 100 cities, but the specific implementation has not yet been announced. In addition, some other player goals have also shrunk to varying degrees, and its urban NOA function has not been pushed to users on a large scale. Although the ultimate goal of the vehicle-mounted large model is "universal" intelligent driving assistance, the intelligent driving function in specific cities or specific scenarios with weak versatility is still not widely opened. It is not difficult to see that in the research and development of intelligent driving technology based on large models, on the one hand, all manufacturers are cautious about the application and popularization of new technologies in the field of intelligent driving from a responsible point of view. On the other hand, the development and application of intelligent driving models is still quite technically difficult, and it is still unrealistic to overcome it in a short time. In this regard, Professor Deng Zhidong of Tsinghua University once said in an interview: Tesla has been promoting this (intelligent driving) field since 2020, and as a leading new energy vehicle manufacturer, they have accumulated the world's most abundant data resources. China only began to accelerate the layout of this field after March this year, and there is a gap of at least three years between China and Tesla, so it is a challenge to surpass Tesla in a short period of time. It can be seen that on the road of the domestic intelligent driving model, it is not a way to be in a hurry, nor is it in line with the objective law, and one step at a time is the way to development. 02.Tesla has obvious advantages, the research and development of on-board large models of the BEV+TRANSFORM route, and the introduction of algorithms that occupy the network are important technical directions for various car companies to compete in the field of autonomous driving. First of all, the BEV+TRANSFORM technology route has good versatility and flexibility, and can adapt to the needs of autonomous driving in different scenarios. In addition, this technical route can reduce the dependence on high-precision maps, reduce the dependence of autonomous driving technology on the map mapping qualifications of OEMs or solution providers and the requirements of data security, and through real-time perception and data processing, the vehicle can better adapt to changes in the road environment and improve the safety and reliability of driving.

Edit the search image.

Tesla's decision-making logic.

Secondly, this technical route can improve the perception ability of autonomous driving, and through the introduction of occupancy network, it can better deal with occlusion and interaction problems in complex scenes, and improve the accuracy of perception results. At the same time, the introduction of occupancy network can also reduce the cost of autonomous driving system, and compared with the traditional lidar + high-precision map scheme, the BEV+TRANSFORM technology route has a higher cost performance in perception. It is important to know that the removal of high-definition maps and lidar is conducive to reducing vehicle costs and promoting the further popularization of autonomous driving technology. Finally, the BEV+TRANSFORM technology route and the application of the occupancy network are important research directions in the field of autonomous driving perception, which are conducive to the completion of technical reserves and technology iterations in the technology competition. You must know that in the current competition between OEMs and solution providers, whoever has the opportunity will be able to take a step closer to the function and get a potentially profitable ticket in this "melee". In the field of autonomous driving, Tesla is definitely one of the most forward-looking companies. Since 2015, Tesla has begun to lay out the self-development of autonomous driving software and hardware, and has listed the self-development of algorithms and chips as the focus of development in these years. In 2020, Tesla released FSD Beta and took the lead in upgrading the algorithm from the original 2D+CNN route to the BEV+TRANSFORM route. The first question that needs to be understood is what are the advantages of the BEV+TRANSFORM route. Transformer uses deep learning Xi neural network, which has the advantage of feature extraction that can achieve global understanding, thereby enhancing the stability and generalization ability of the model. Through the way of position encoding, the position information in the sequence data can be better processed, so as to more accurately understand the relationship between the elements in the sequence. When processing sequence data, CNNs often need to convert sequence data into image data, which may lead to the loss of location information. BEV stands for Bird's Eye View, which is a method of projecting three-dimensional environmental information onto a two-dimensional plane to show objects and terrain in the environment from a top-down perspective. Compared with the traditional small model, BEV+Transformer improves the perception and generalization capabilities of intelligent driving, which helps to alleviate the long-tailed classification of intelligent driving. In terms of perception capabilities, BEV unifies the perspective and fuses multi-modal data such as lidar, radar, and camera into the same plane, which can provide a global perspective and eliminate occlusion and overlap between data, thereby improving the accuracy of object detection and tracking. The self-attention mechanism in the Transformer model allows the individual elements to be independent of each other when computational, which makes it easier for the model to perform parallel calculations, thus improving computational efficiency. However, the convolution operation in the CNN model often needs to consider the relationship between adjacent elements, which may lead to the difficulty of parallel computing. In terms of generalization ability, the Transformer model can realize the feature extraction of global understanding through the self-attention mechanism, which is conducive to finding the internal relationship of the thing itself, so that intelligent driving can learn to summarize and summarize instead of mechanically learning Xi.

Edit the search image.

Schematic diagram of the Transformer model algorithm.

At the same time, the Transformer model is able to consider all elements in the input sequence at the same time, so as to better capture the long-distance dependencies in the sequence data. However, when processing sequence data, CNN models often need to capture local features step by step through convolution operations, which may lead to the loss of long-distance dependencies. On this basis, in 2022, Tesla introduced a time-series network into the algorithm and upgraded the BEV to an occupancy network. Occupancy network is a three-dimensional object detection method based on depth Xi, which can effectively improve the position and shape of objects in three-dimensional space, so as to effectively solve the problem of information loss in the process from three-dimensional to two-dimensional from the model. Looking at the progress in China, CNN is undoubtedly facing elimination and does not have much value for discussion. From the perspective of the advancement of perception algorithms, the industry as a whole has basically gradually upgraded the algorithm to the BEV+Transformer route since 2022. In this way, Tesla can basically be said to be the first batch of companies to adopt BEV+Transformer technology. 03.There are many difficulties in the development of intelligent driving models, and many problems need to be broken through, and autonomous driving models require considerable resources and investment to support their operation. At the perception level, autonomous driving systems need to process data from different sensors, such as lidar, millimeter-wave radar, ultrasonic radar, as well as high-definition cameras, GNSS, etc. These data have different spatiotemporal properties, and how to effectively integrate these data and improve the efficiency and accuracy of data processing is the primary problem of the intelligent driving model. In addition, the amount of data faced by the intelligent driving track is growing exponentially, and how to efficiently store, process and analyze these massive data in order to achieve more accurate ** and decision-making in large models has put forward higher requirements for enterprises and researchers. In this regard, in the field of intelligent driving technology, the three mountains on this track are the limitations of model training, the lack of on-board computing power, and the problems in the application of networking. In terms of model training, firstly, the cost of high-quality data collection is high, and the data of some specific driving scenarios is difficult to obtain, resulting in the defects of the model in generalization ability and accuracy. Second, deep learning Xi models rely on a large amount of labeling data for training, and the manual labeling process is not only time-consuming and laborious, but also may introduce errors. In addition, in the case of limited training data, the model is prone to overfitting, that is, the performance is degraded when faced with new data in the actual application. In addition, in terms of on-board computing power, the computing power of on-board hardware is limited compared with that of servers, and in order to achieve a balance between on-board computing power and cost, more effective identification algorithms and decision-making algorithms are often needed to ensure it. In this way, to a certain extent, it can solve the limitation of computing power that only high-end intelligent driving models can be used, and improve the versatility of intelligent driving models. As for connected applications, intelligent driving relies on a large amount of data transmission, including vehicle-to-vehicle and vehicle-to-cloud communication. However, existing data transmission technologies may suffer from issues such as network latency and data loss. At the same time, the connected nature of intelligent driving may also bring security risks, so ensuring data security and privacy has become a top priority. Finally, the lack of unified standards makes it difficult to communicate data between different vehicles and devices, which limits the widespread promotion of intelligent driving networking. At the same time, the on-board large model needs to be supported by powerful computing power and storage devices, and the current on-board hardware devices still have limitations in terms of computing power and power consumption, and how to realize the deployment and optimization of the large model under these constraints is the key problem faced by the intelligent driving large model. As for the limitations of the model, compared with the concept of large model, some domestic scholars have put forward the concept of general model.

Edit the search image.

Lee promotes the concept of team uniad.

In May this year, the team of Li Hongyang, a young scientist from the Shanghai Artificial Intelligence Laboratory, published an article that proposed for the first time a general model of autonomous driving integrating perception and decision-making, and won the CVPR 2023 Best ** award. This is also the first time in the 40-year history of the top conference CVPR that the best of the best awards in the field of autonomous driving have been awarded. Li Hongyang's team proposed a set of goal-oriented autonomous driving algorithm scheme (UNIAD) with the design concept of adopting an end-to-end architecture, with planning as the ultimate goal, and integrating all autonomous driving modules. Li Hongyang said that the difference between this solution and MTL and Tesla is that the latter tries to achieve the best performance for all tasks, while their solution focuses on the results of planning. In addition, according to ** report, in addition to the technical elaboration at the meeting at that time, Li Hongyang also put forward a sharp point: "I think there is no large model of autonomous driving in this industry now. The definition we give Uniad work is also a generic model of autonomous driving, not a large model. "If the autonomous driving model eventually develops into a perception model, which is imperfect, it can be done in general vision. This also represents the new concept of some domestic academic circles for the intelligent driving model: avoid reinventing the wheel, improve the general-purpose, and make up for the shortcomings of algorithms and resources. However, this is only a means to solve the limitations of the model, and as for the other problems that are currently widespread in the intelligent driving model, it still needs the continuous development of technology and the accumulation of time to solve. 04.Conclusion: There is a long way to go for the large model to be put on the car, although the intelligent driving model faces many challenges in technology research and development, data collection and processing, hardware equipment, etc. However, with the continuous progress of science and technology, all kinds of players have joined the competition and promoted the continuous development of technology, and we can foresee that in the near future, it is an inevitable trend for intelligent driving models to get on the car. However, there is still a long way to go to realize the wide application of large models in autonomous driving and other fields. On the one hand, in the future development, we need to overcome the current technical difficulties and promote the maturity and commercial application of large-scale model technology. On the other hand, it is also necessary to abandon "marketing thinking" in the field of heavy technology, and carefully balance data processing, perception decision-making, and computing power matching, so as to make the vehicle-mounted large model that truly meets the trend of intelligent driving available as soon as possible.

Related Pages