Just today, there are ** reports from overseas that Chinese Internet technology giant ByteDance quietly uses OpenAI's technology to develop its own AI large language model - similar to the practice of "taking shortcuts and taking shortcuts". The company's account has been suspended by OpenAI due to ByteDance's violation of OpenAI's terms of service. At present, this incident has attracted the attention and discussion of people in the industry.
Internal documents show that ByteDance relied on OpenAI's API (Application Programming Interface) at almost every stage of development when developing the basic large language model, codenamed "Project Seed". The company's actions are considered a direct violation of OpenAI's terms of service.
Jodi Seth, a spokesman for ByteDance, responded that the data generated by GPT had been used to label models in the early development of "Project Seed", but had been removed from ByteDance's training data in the middle of this year. Jodi Seth said ByteDance is licensed by Microsoft to use the GPT API. ByteDance uses GPT-driven products and functions in non-Chinese markets, but uses self-developed models to drive "bean bags" in the Chinese market.
OpenAI spokesperson Niko Felix has confirmed that OpenAI has suspended ByteDance's account due to its actions not in line with company policy. Niko Felix emphasized that all API customers must adhere to OpenAI's usage policy to ensure that the technology is used for good purposes. "Although ByteDance has little use of our API, their account has been suspended during our further investigation. If we find that their use does not comply with these policies, we will ask them to make the necessary adjustments, or terminate their account. ”
While rarely discussed publicly, it is common for smaller companies to leverage proprietary AI models, especially OpenAI's, to develop AI products that compete with them. Since OpenAI and Microsoft have not yet taken a case of violation as a reference, this practice is still in a legal gray area. "A lot of startups are taking that risk right now," said N**Een Rao, VP of generative AI at Databricks.
However, it is extremely rare for such a large and well-resourced tech giant like ByteDance to engage in such behavior. This seems to indicate that the Project Seed team is under tremendous pressure to deliver results quickly. "I often get job emails from ByteDance," said an AI researcher at a big U.S. tech company, "and I usually ignore them." But this incident made me want to mark these emails as spam. ”
Other companies have encountered similar problems, fearing that the output of their GPT models will be used to develop competitors. For example, a Google researcher chose to quit because some colleagues tried to use the data that contained the content of ChatGPT conversations. The incident did not involve the misuse of OpenAI's API, but it caused a lot of embarrassment internally, and the employees involved were also lightly disciplined.
Since ByteDance launched Project Seed about a year ago, the project has been a high-priority, high-secrecy task. Employees involved are required to sign special non-disclosure agreements, and access to information within the project is becoming increasingly isolated. Zhang Yiming, the founder of ByteDance, has been closely following the project.
Project Seed is currently developing two main products: one is the AI chatbot Doubao, which has been launched in the Chinese marketThe other is a chatbot platform for business users, which is currently under development and is planned to be promoted and sold to business users through ByteDance's cloud services division.
ByteDance's goal of opening Project Seed is ultimately to develop artificial general intelligence (AGI) like OpenAIAt the same time, it seems to be more inclined to become the Chinese version of ChatGPT as soon as possible. The project team has been instructed to achieve GPT-3 by the end of the year5 same performance level and reach the performance level of GPT-2024 by mid-4. Now the parameters of the SEED model are about 200 billion, GPT-35 has a parameter of 175 billion. (OpenAI has not yet announced the size of GPT-4 parameters.) It is rumored to be trillions)
Project Seed is not affiliated with ByteDance's TikTok and is mainly developed on servers in China. Most of the team members are based in China, but there are also members based in the United States. The project is led by Zhu Wenjia, head of ByteDance's search department, who reports to the company's senior engineering leader, Yang Zhenyuan. Other key leaders of the project include Qiao Mu (part of Zhu Wenjia) and Xiang Liang, who is responsible for the Applied Robotics Xi team.
One interesting question is how the quality of network information changes when a large number of large language models (LLMs) begin to participate in building other large language modelsWe don't know yet. Because these foundational models are themselves trained on unreal, artificially manufactured data, using them to build more large language models may further amplify the spread of misinformation. As Databricks' N**een Rao puts it, "This can ultimately lead to a disconnect from the real world." ”
This incident about ByteDance was brought out overseas, reflecting the fierce competition and complex dynamics in the AI field, in fact, ByteDance can actively join the competition of AI large language models, as a Chinese person, it is more supportive in terms of mentality. ByteDance has many strong advantages in making AI large models, and in the future, it will also need to insist on long-term large-scale investment to independently develop and create leading AI large models and their applications and services that are highly competitive in the world.
In 2022, ByteDance's revenue will reach $85.2 billion, a year-on-year increase of more than 38%;In the first quarter of 2023, the company's revenue was close to $24.5 billion, a year-over-year increase of nearly 34%;In the second quarter of 2023, revenue was $29 billion, an increase of about 40% year-over-year. Even assuming a 30% year-over-year growth rate, ByteDance's revenue for the whole of 2023 will exceed $100 billion for the first time — possibly above $110 billion.
On the other hand, many of ByteDance's (platform) products have a large number of active users, and user stickiness is often not low. For example, Douyin has more than 700 million daily active users, TikTok has more than 1.1 billion monthly active users, Toutiao has about 400 million monthly active users, and Capcut may have two or three hundred million active usersOther products such as tomato**, watermelon**, tomato listening, Pipi shrimp, understand Chedi, soda**, happiness, live Xiaobang, Feishu (lark), etc., have more than one or two hundred million active users, and as few as millions or tens of millions.
At present, ByteDance has launched several independent AI chatbot products for Internet users, including Doubao, Little Wukong, CICI, and Chitchop. Doubao and Little Wukong are for Chinese users, and CICI and CHITCHOP are for overseas users. In addition, ByteDance's products, such as Toutiao, Jianying, Feishu, etc., are also upgrading the user experience by integrating AI technology. Judging from ByteDance's move to enter the generative artificial intelligence AI, it should focus on the global market, not just the Chinese market.
Not long ago, there were rumors on the Internet that ByteDance is developing an open platform that will allow users to create their own AI chatbots, which is expected to be rolled out to users as a public beta version by the end of December this year. OpenAI's ChatGPT has already launched the function of creating AI agents to users, which can be easily customized according to their own needs and preferences, such as AI that focuses on Chinese and English translation, AI that is proficient in financial investment, AI that is proficient in a variety of programs, AI that is particularly good at creating certain types of high-quality images, AI that specializes in operating social content, AI that provides legal consulting services, and AI that helps see a doctor and catch medicine...... In short, too much. In addition, ByteDance is also developing an AI image generation tool similar to Midjourney. When it comes to AI image generation, Midjourney is the leading tool.
In other words, a tech giant like ByteDance already has a lot of good cards in its hands, and if it is determined to do a good job of AI large language models at a strategic level, it will be very interesting. When ByteDance's AI large language model reaches GPT-4 or above in terms of performance, then ByteDance will hopefully occupy a very important position in this global AI large model competition.
According to data from a Feishu document, in November this year, the top five AI chatbots in China were Wenxin Yiyan, iFLYTEK Xinghuo, Ali Tongyi Qianwen, ByteDance Doubao, and Zhipu Qingyan. This alone seems to be a glimpse of the fact that although many domestic companies are making large models in the early days, there should be only a handful of domestic AI large model players who can really be counted on - in the long run, China can have two or three companies like OpenAI to become the base of large models - and finally lead to AGI, which is actually very good. ByteDance is more likely to be one of these two or three companies.
I'm sorting out the release for tech freaks).