Gemini detonated the concept of multimodal AI, and the prosperity of data elements increased

Mondo Technology Updated on 2024-01-29

(Report Producer: Huaxi**).

1. Gemini detonated the concept of multimodal AI, and the development of multimodality accelerated

On December 6, local time, Google announced the release of its most powerful artificial intelligence model, gemini10。Gemini is a native multimodal large model built on Transformer Decoder, and is currently available in three versions: the most powerful Gemini Ultra, the best model for multitasking, and the Gemini Nano for devices, the Pixel 8 Pro became the world's first smartphone equipped with Gemini Nano.

For the first time, Gemini Ultra outperformed human experts in the MMLU (Massive Multitasking Language Understanding Dataset) test, achieving 30 SOTA across 32 multimodal benchmarks, generalizing and seamlessly understanding, combining, and manipulating different types of information, while identifying and understanding text, images, audio, and five types of information.

According to the Financial Associated Press, Google's artificial intelligence assistant Bard has added part of Gemini1The technology of the 0 model is expected to be fully integrated early next year. Multimodal technology can improve the efficiency and quality of human-computer interaction, and users can interact with computers more naturally and intuitively. As an important scenario of multimodal application in the field of robotics, multimodal technology can help robots achieve more accurate, efficient and coordinated motion control, thereby improving the work efficiency and quality of robots.

Previously, Google's large model PALM-E-driven robots can perform long-span tasks, perform planning tasks, tell jokes in the case of a given image, etc., Gemini is expected to be applied to improve the practice of the robot task layer, promote humanoid robots and other sub-industries to accelerate the commercialization, and companies in related fields are expected to usher in opportunities.

Second, the data element policy continues to increase, lighting up the new journey of the digital economy

With the increasing importance of data elements in the country, the top-level design is expected to accelerate by the end of the year. After the National Data Bureau is put into operation, it will conduct intensive research while recruiting talents, and the management measures for guiding pricing of public data will soon be implemented, and the circulation of data elements will be substantially accelerated.

The policy guidance and support of data elements have changed from shallow to deep, showing a trend of "point to point, exploration and experimentation, implementation and deepening development". Many places have introduced policies and programs for the circulation and trading of data elements, improved the framework of the data asset trading system, increased the activity of on-exchange transactions, and built a vibrant data element market.

The National Data Bureau has vigorously promoted the development and utilization of public data resources, and accelerated the reform of market-oriented data allocation. On December 8, the data resource entry service and the weight weight released the country's first "data resource entry ***", contributing collective wisdom to promote the inclusion of data resources.

2.geminiThe concept of multimodal AI has been detonated, and the prosperity of data elements has risenGemini detonated the concept of multimodal AI, and the development of multimodality acceleratedA few days ago, Google released the most powerful multimodal AI in the company's history, Gemini10, as soon as the news was released, Google's stock price rose by more than 5%, with a total market capitalization of more than 17 trillion US dollars, and led to the explosion of the concept of multimodal AI, as of December 7, the ** Nasdaq Technology ETF rose by more than 227%, as of noon on December 8th, Suzhou Keda, Neta Software rose by the limit, Yuncong Technology rose by more than 9%, Kunlun Wanwei, Shengxun Shares, 360 rose by more than 8%, Jingyeda, iFLYTEK, Tors and other stocks have soared. It reflects everyone's high attention to the breakthrough of multimodal technology. (a) gemini10: The native multimodal AI language understanding ability surpasses that of humans for the first timeUnlike the previous multi-modal large model, which only stitched together plain text, pure visual and pure audio models, Gemini was trained on different modalities from the beginning, which can seamlessly understand and reason about the input content of various modalities. It also means that gemini can understand the world around us in the same way that humans do, and absorb any type of input and output – whether it's text, audio, images, etc

In the MMLU (Massive Multi-Task Language Understanding Dataset) test, the Gemini Ultra scored at 90A high score of 0%, surpassing human experts for the first time. The MMLU test includes 57 subjects, such as mathematics, physics, history, law, medicine, and ethics, and is designed to examine the world's knowledge and problem-solving skills. In each of these more than 50 different subject areas, Gemini performed as well as the best experts in these fields. Google's new benchmark for MMLU allows Gemini to use reasoning more carefully before answering complex questions, a significant improvement over relying solely on intuitive responses.

In the new MMMU benchmark, the Gemini Ultra also achieved a 59With a high score of 4%, this test includes multimodal tasks that span different domains that require an in-depth reasoning process. In image benchmarking, the Gemini Ultra also outperformed its previous leading models, and this was achieved without the help of an OCR system! Various tests have shown that Gemini has shown great capabilities in multimodal processing, and also has great potential for more complex reasoning.

(2) Multimodal AI applications have been released one after another, and multimodal large models lead the development of the AI fieldIn gemini1Before the release of 0 attracted great attention, many companies at home and abroad have launched multimodal AI applications one after another, and they have simultaneously demonstrated good measured performance. November 29, pika 10 officially released, it is an AI generation platform that benchmarks RUNWAY GEN-2, with functions including: text generation**, image generation** function, you only need to enter a simple text description, or upload an image with text, you can create a high-quality **; style conversion; Edit the content, change or add elements inside, and you can also change the aspect ratio size, etc.

On November 24th, Stability AI released its latest generative model, Stable Video Diffusion, and announced that it is open source, which not only supports text-based and image-based generation, but also provides a powerful multi-view 3D prior, and the model generates multiple views of images in a feedforward way, requiring only less computing power, and the performance is also better than that of image-based methods. Stable Video Diffusion is released as two image-to-model models, capable of generating 14 and 25 fps at customizable frame rates of 3-30 fps, and judging from the official external evaluation results, SVD is more popular with users than Runway and Pika in both frame rate standards.

On November 23, Adobe announced that it has completed the acquisition of Indian AI startup Rephrase AI, whose expertise in generative AI audio technology and text-to-generation tools is expected to effectively extend Adobe's generative capabilities. On November 21st, Runway released the Motion Brush, in terms of operation, you only need to drag the mouse to the static image to "brush", you can make the specified area "move", so as to get an animation**, and users can freely choose incoherent areas, but also control the direction and survival state of element movement, from the measured effect, animation stability and coherence still have a lot of room for improvement. Also available are Style Presets, which are 26 filters that you can use without complicated prompts, and master camera movements that work with the Motion Brush.

On November 18, ByteDance released Pixeldance, a model based on the diffusion model, which combines text and image instructions at the same time, aiming to enrich the scene and improve the quality of details and dynamic effects. The instruction paradigm is "Text, First Frame Image, Last Frame Image", where the first frame image instruction is used to describe the main scene of **, and the last frame image instruction is used for training and inference. According to the evaluation results, PixelDance outperforms other T2V methods in both FVD and ClipSim on the MSR-VTT dataset, reaching a leading level.

On November 16, Meta released the Emu video of the Wensheng model based on the diffusion model. The generation principle is to generate images according to the prompt words first, and then generate them according to the reinforcement conditions of the images and texts, that is, the model **focuses on how the image will evolve in the future, compared with direct text-based generation**, this method can effectively retain the visual diversity, style and quality of the Wensheng graph model, so as to surpass the generation effect of the T2V method under the same pre-training conditions. According to the results of the evaluation of JUICE (a robust evaluation scheme designed by the investigators), EMU Video significantly outperforms all previous work, including commercial solutions such as PIKA and RUNWAY GEN-2, in terms of quality and text reproduction.

(3) Pocket model - the end-side AI innovation cycle has begunGoogle's latest high-end flagship Pixel 8 Pro is the world's first device to be equipped with Gemini Nano. On December 8, the well-known open-source generative AI platform StabilityAI has open-sourced the 3 billion parameter large model StableLMZEPHYR3B on its official website. Designed for mobile devices such as mobile phones and laptops, ZephyR3B features small parameters, strong performance, low computing power consumption, and can automatically generate text, summarize summaries, etc., and in benchmarks conducted on platforms such as MT Bench and Alpacaeval, StableLM Zephyr 3B has performed well, showing its superior ability to generate contextual, fluent, and linguistically accurate text. In testing, it competes with some of the other larger models such as falcon-4b-instruct, wizardlm-13b-v1, llama-2-70b-chat, and claude-v1.

The development of generative AI and device-side AI has forced smart device hardware performance upgrades. Chip leaders are vying to launch processors that support generative AI, and Intel and Qualcomm have launched chips specifically designed for AI mobile terminals. In Apple's next-generation M3 series chips, M3Max supports the development of billion-parameter AI models. In the Huawei Mate60, Xiaomi 14, and VivoX100 press conferences, OEMs have focused on their AI functions, and it is expected that in 2024, Samsung S24 and other high-end mobile phone conferences, its Gaussian AI model will also be the main function; The head PC brands have made corresponding layouts for AI PCs, Lenovo's first AI PC has been launched, Apple actively promotes the landing of 5G chips in the MacBookPro product line to meet the needs of AI PC moments, and AI notebooks of HP, Acer and other brands are expected to be launched one after another in the year.

2.2.The data element policy continues to increase, lighting up the new journey of the digital economyAccording to the "2023 China Data Exchange Market Research and Analysis Report", from 2021 to 2022, the market size of China's data exchange industry will increase from 617600 million yuan increased to 876800 million yuan, with an annual growth rate of about 42%, with a significant growth rate. According to the Shanghai Data Exchange**, the market size of China's data industry will continue to maintain a steady growth trend in the future, and the market size of China's data industry is expected to reach 204.6 billion yuan by 2025 and 5,155 by 2030900 million yuan, with a compound growth rate of about 203%。In the next decade, the compound annual growth rate of China's data transaction market will be much higher than the CAGR level of the global data exchange market, which simultaneously reflects the vigorous development potential of China's data element market.

As a new driving force for the growth of the digital economy, data elements have profound theoretical, practical, practical and historical logic, and constitute the basic strategic resources of the country. From a theoretical point of view, according to the Solow model: y=da·f(k, l) or its labor-enhancing style: y=f(k, da·l), the steady-state growth rate of per capita output (capital per capita) depends on the technological progress enabled by the data elements. From the perspective of practical logic, data elements can effectively improve the development quality of primary, secondary and tertiary industries, which is reflected in agriculture, data can promote scientific agricultural management decisions, assist in the comprehensive development of refined agricultural production, and support the traceability of agricultural product quality and safety;

Reflected in the manufacturing industry, data can drive enterprises to continuously optimize and adjust production through real-time monitoring and data analysis of production, manufacturing, and other links, and can support the acceleration of the implementation of new models and new formats in the manufacturing industry; Reflected in the service industry, the producer service industry can improve the accuracy and service quality of services through data analysis of user information and user feedback, while the life service industry can rely on data to improve the scale, efficiency and quality of service supply. From the perspective of practical logic, giving full play to the role of data elements constitutes an inevitable requirement for high-quality development;

In April 2020, the "Opinions of the Communist Party of China on Building a More Perfect Market-oriented Allocation System and Mechanism of Factors" was officially released, establishing the core position of data as the fifth major factor of production; Compared with other factors of production, data has the characteristics of virtual enablement, infinite convergence, intelligence and immediacy, and ubiquitous empowerment, which can promote the implementation of high-quality development by driving indirect capital investment, flexibly combining other factors, and enhancing their production value. From the perspective of historical logic, according to the data of the "China Data Element Market Development Report (2021-2022)", the contribution rate of data elements to the current GDP has shown a steady upward trend, and as of 2021, the value has reached 147%, up about 27pct。

In this regard, China's first and relevant departments attach great importance to it and have successively launched a series of measures to accelerate the construction of the data element trading and circulation market. In March 2023, it was decided to establish a National Data Bureau, and on October 25, the National Data Bureau was officially unveiled. The mission of the National Data Bureau is to promote the compliant and efficient circulation and use of data, and empower the real economy. Its work focuses on building a basic system that adapts to the characteristics of data, involving various aspects such as property rights, circulation, distribution, and governance of data elements. It comprehensively includes improving the basic data system, promoting data circulation, trading, development and utilization, promoting the construction of data infrastructure, strengthening data security governance, and promoting scientific and technological innovation and international cooperation in the field of data. With the continuous development of the digital economy, new requirements have been put forward for data infrastructure, requiring the construction of facilities that can promote the circulation and utilization of data and give full play to the value and utility of data, including technologies and facilities in network, computing power, data circulation, data security and other aspects.

On November 23, 2023, at the 2nd Global Digital Expo, the National Industrial Information Security Development Research Center announced the release of the "Technical Requirements for Public Data Authorization Operation Platform" (TCECC 024-2023) group standard, which is managed by the China Electronic Chamber of Commerce. It covers the whole process of data registration, authorization, and circulation, and sets technical requirements from five aspects: function, performance, operation and maintenance, security, and interconnection. The introduction of this standard will play a positive role in stimulating the value of public data and promoting the healthy development of the digital economy. According to the National Industrial Information Security Development Research Center, the center will promote this standard in the future, and continue to work on the construction of the public data standard system, the formulation of management norms and the integration and application of the industry, so as to support the development of the data element market.

On November 23, 2023, the "Work Plan for Supporting Beijing to Deepen the Construction of a Comprehensive Demonstration Zone for the Expansion and Opening-up of the National Service Industry" was approved, supporting Beijing to actively create a pilot zone for data-based systems. This includes promoting the establishment and improvement of data property rights, circulation and transactions, income distribution and governance systems, formulating data transaction standards and negative lists, and strengthening the openness of public data and the development and utilization of data by third-party multiple entities. In addition, the plan also includes reducing the cost of data reprocessing, building a world-class intellectual property database, conducting security assessments for data exports, and exploring the development of classification guidelines and important data catalogs for industry data such as autonomous driving and biological genes. Through measures such as fintech innovation regulatory tools and full-chain "sandbox supervision", we will give full play to the role of digital technology and data elements, and enhance the innovation capabilities of fintech and its benefits to the people and enterprises. On November 25, 2023, Liu Liehong, director of the National Data Bureau, announced at the 2023 Global Digital Business Conference that the National Data Bureau is expected to focus on promoting the "Data Element X" initiative, which aims to collaborate with relevant departments to give full play to the multiplier effect of data elements. The action will focus on both supply and demand, and strengthen the traction of scenario demand in several key areas such as intelligent manufacturing, commercial circulation, transportation and logistics, financial services and medical health. This move aims to break down the barriers to data circulation, improve the quality of data supply, and promote the combination of data elements with other elements, so as to give rise to new industries, business formats, models, applications and governance methods. Through this action, Liu Liehong hopes to promote the effective development, circulation and application of data, transform China's advantages in data basic resources into new advantages in economic development, and promote data to play its diversified multiplier effect in various scenarios. On November 25, 2023, China officially launched the data transaction chain, which was jointly initiated and established by seven provincial-level data trading institutions, including the Shanghai Data Exchange and the Zhejiang Big Data Exchange Center. In addition, Zhengzhou, Hunan, Qingdao, Suzhou and other provincial and municipal trading institutions have also joined as the second batch to jointly promote the "trusted communication" plan of the data element market. This plan aims to realize the standardization and ecological interconnection of data transactions and promote the circulation and transaction of data nationwide through cooperation in system co-creation and standard co-system. We believe that with the official launch of the data transaction chain and the future development of the "Data Element X" action, it will strengthen the demand traction of the data element industry scenario, break through the circulation barriers, improve the quality of supply, promote the combination of data elements and other elements, and give birth to new industries, new formats, new models, new applications, and new governance. On December 7, 2023, Liu Liehong, director of the National Data Bureau, and his delegation investigated the Guangzhou Data Exchange and attended the industry digital business exchange meeting. Director Liu Liehong pointed out that the key to data elements is to promote circulation and participate in socialized production. It is necessary to pay attention to the cultivation of digital business, and the multiplier effect of data elements should be brought into play in data circulation and trading, so as to promote the efforts of both ends of supply and demand, and strive to achieve the coordination of the industrial chain, the reuse of different fields and the integration of data diversity, so as to achieve quantitative change to qualitative change. He emphasized that the data exchange should actively promote exploratory work, create a credible circulation environment, and solve the problems of on-site compliance, supply and demand matching, and controllable, manageable and traceable. On December 8, 2023, the 3rd Sichuan Provincial Digital Intelligence Craftsman Talent Competition and Data Element Development Promotion Conference was held in Suining. At the conference, the data resource entry service jointly released the country's first "data resource entry into the table" through the establishment of an ecological alliance including accounting institutions, law firms, data security vendors, big data vendors, data asset evaluation institutions and related scientific research institutes, etc., to give full play to their strengths, provide customers with professional one-stop data resource entry solutions, and help data resources to enter the table in an all-round way. Relying on the ecological alliance system, it is divided into business and process to help enterprise data resources be entered into the table in compliance. On December 8, 2023, at the opening ceremony of the 2nd Digital Construction Summit, Liu Liehong, director of the National Data Administration, said that the National Data Administration will vigorously promote the development and utilization of public data resources and accelerate the reform of market-oriented allocation of data. Implement a system of separate property rights, clarify compliance policies and management requirements for authorized operations of public data, clarify the rights, responsibilities and obligations for the supply, use, and management of data, and explore mechanisms for the formation of public data products and services, so that public data can be "supplied"; Accelerate the construction of secure and credible data infrastructure, develop data space and high-speed data networks, promote the application of privacy computing and blockchain technology such as anonymization, federated academic Xi, and multi-party secure computing, and enhance the ability to use data in a credible, controllable, and measurable manner, so that public data can "flow"; In view of the pain points of industry development, the "data elements" action plan is implemented, a number of typical application scenarios serving economic and social development are formed, and the multiplier effect of public data is given full play to make public data "used well".

This article is for informational purposes only and does not represent any investment advice from us. The world of fantasy filmsOrganizing and sharing information is only recommended for reading, usersInformation obtainedFor personal Xi only, please refer to the original report for use. The report totals:30 pages。Due to space limitations, only some of the contents are listed.

Related Pages