Wen Shenxiang, Zhao Feiyu, Wu Hongjian.During the Spring Festival of the Year of the Dragon, the biggest news in the tech industry is none other than SORA.
The performance of SORA released by OpenAI in the generation of ** is amazing, and there is no need to repeat it. All we need to know is that SORA marks another breakthrough in AIGC ceiling. It is not only a matter of the science and technology circle, but also affects the majority of practitioners in the fields of film and television, and advertising. Just like the impact of AI mapping on the artist community before.
Technology has led us to a "new world". At this point, the old rules are worn off, and there are many areas of ambiguity waiting to be drawn in the distance. For example, if I use the style of a well-known director or photographer to generate **, then who should the copyright of this ** belong to? It's me? Or the AIGC platform? Or should the director also take a share?
Industries are also facing the problem of "old models being impacted" — the New York Times sued OpenAI and Microsoft for allegedly using millions of New York Times articles to train artificial intelligence; Copyright Gallery**Getty Images notes that Stability AI has copied more than 12 million images from its gallery and associated titles and metadata; Hollywood screenwriters went on strike to limit the use of AI technology in the creative process of screenwriters. These are all triggered by technological change**.
However, the changes brought about by the industrial revolution and the Internet wave have fully demonstrated that technological change is irreversible. Rather than a brutal "boycott", it is imperative to divide powers and responsibilities and formulate new rules. Focusing on the following three key issues, Deep Echo conducted a ** session with relevant legal professionals
Who owns the copyright of the content produced by users using the AIGC platform?
What is the focus of the dispute over copyright issues?
How are stakeholders responding to change? How will the industry ecosystem evolve?
Who owns the copyright must be analyzed on a case-by-case basisAIGC brings a lot of convenience, even if the user is not a professional painter or cameraman, as long as he has a certain amount of prompt word technology, he can "stand on the shoulders of giants" to create professional content. This raises the question: who should copyright AI-generated content belong to? Who will be liable if a dispute arises?
There are legal cases on similar issues. According to the verdict of China's "AI Wensheng Diagram No. 1 Case", the plaintiff, Li Yunkai, learned AI painting out of interest, and he began posting AI-generated ** on social media in November 2022. Some self-** accounts use relevant ** without permission, and there is no signature**. Li Yunkai took one of the self-leading masters to court.
As a lawyer, Li Yunkai hopes to explore the boundaries of AIGC through judicial decisions: is the content created by users with AI "works", and does the copyright belong to users?
The judgment held that the ** generated by Li Yunkai using the open-source model Stable Diffusion is a work of art, and Li Yunkai is the copyright owner, and the defendant's unauthorized use of the ** and removal of the ** watermark infringes Li Yunkai's right of information network dissemination and authorship, and must compensate him 500 yuan and issue an apology statement.
It is important to note that such precedents cannot be directly applied to all situations. Mi Xinlei, a partner at Jincheng Tongda & Neal Law Firm, told Shenxiang that the basis for the judgment of copyright ownership is the proportion of human intelligence and originality in the entire process of AI-generated content, and the more accurate the user's training, the more selective it is, and the greater the possibility of finally obtaining the copyright of the generated content.
Another situation that is prone to infringement disputes is when the user uses AI to generate some kind of stylized ** (e.g., drawing a cat with a Miyazaki-style model). I learned from people in the legal profession that the copyright law does not protect style, and in many court sentences, imitation of style is not necessarily considered infringement, so users "drawing a cat with a model in the style of Miyazaki" generally does not constitute infringement.
However, if the user directly "feeds" the AI a picture of Hayao Miyazaki and tells the AI to generate ** in this style, this constitutes infringement. This is because the user is not authorized to use Hayao Miyazaki's work.
There is also a copyright ownership issue, which arises between the AIGC platform and the user. However, this kind of conflict only exists in "theory", and in practice, the AIGC platform is more interested in "expanding the scale of users and earning more membership fees" than the issue of copyright ownership. To do this, they are even willing to cover potential litigation costs for their users.
Microsoft, Google, OpenAI and many other companies have promised that if users face third-party infringement claims due to the use of AIGC products or services provided by them, the company agrees to bear the corresponding liability for compensation. Taking Microsoft as an example, on September 7, 2023, Microsoft issued a letter of commitment for the copyright of Copilot, stating that if a business user is sued by a third party for copyright infringement due to the use of Copilot or its generated output, Microsoft will agree to defend the user and pay the compensation arising from the case. provided, however, that the user has used the guards and content filters built into the product and complied with the other terms.
The platform clarifies the ownership of rights and responsibilities through user agreements, which is a way to reduce potential conflicts. OpenAI made it clear in its terms of service that its compensation clause only applies to paid users, including API users and ChatGPT enterprise users, while the remaining hundreds of millions of free users will not be protected by the compensation clause.
The controversy is between the platform and the data sourceIn copyright disputes, the behavior of the user side tends to receive more attention, but the more core conflict actually occurs between the AIGC platform and the data source.
As we all know, the implementation of AIGC needs to go through three stages: first, data collection; 2. Model training; 3. Keyword input. Data collection and model training are the precursor steps for platform developers, while content input is done by users.
Without enough data to train and debug models, it's difficult to build a generative AI platform that's smart enough. Mi Xinlei told Shenxiang, "For AICG, collecting a large amount of data and using it to train and debug the model is the most core stage. ”
Ideally, the AI model should sign a licensing agreement with the ** that owns the data resources, pay enough fees to the other party, and then use the interface provided by the other party to capture the data. However, the fact is that the current AI technology is developing rapidly, but its data** is often in a "black box" state, and it is difficult to judge its legitimacy.
This is where legal disputes arise. Galleries like Getty Images, which generate **from the sale of copyrights**. If the AIGC platform scrapes the data directly, it threatens the interests of the copyright company. Similarly, the business models of news**, book publishers, and film and television companies are also built on copyright. For enterprises, the effective management and use of copyright resources can promote content monetization and enhance core competitiveness. However, the emergence of AIGC technology has broken the framework of the traditional copyright model.
The core of the problem is that all parties need to find ways to ensure the legitimacy of the AI model training data and avoid infringing copyright or personal privacy. This is subject to the improvement of laws such as the Data Security Law, Personal Information Protection Law, and Anti-Unfair Competition Law.
Until everything is clarified, practitioners may still get into trouble due to the uncertainty of the platform's data sources.
For example, in commercial applications, some brands will try to train proprietary models with their own assets. In principle, as long as the copyright of the material of the brand training model belongs to the brand, the content produced by the proprietary model will not constitute infringement. However, in practice, the proprietary model is trained on the basis of the large model, and it is difficult for the outside world to know whether the data of the large model is compliant.
The solution of these problems needs to be perfected by law. In this process, the data source and the AIGC platform will continue to play, and new industry rules will also be generated.
Litigation is not final, cooperation isAt present, we have seen conflicts between the beneficiaries of the copyright model and the AIGC platform, and related lawsuits will continue to arise. But the conflict is not so much a fierce confrontation between old and new forces as it is the path to a new order in the industry.
The AIGC wave is surging, and traditional giants and tech upstarts are fighting lawsuits, more to fight for a negotiating seat for themselves to "define the future of the industry". Mi Xinlei believes that with the development of AIGC, a new ecology will be formed in all aspects of the industry, and litigation will promote new cooperation. On the whole, the new rules and cooperation boundaries of the AIGC industry will be clarified in the course of development.
At present, the development of the domestic AIGC field is still in its infancy, and many problems have not fully emerged. The relevant legal disputes are still in the public interest stage. In contrast, due to the existence of more relatively mature AIGC products in the United States, the game within the industrial ecology has been more presented.
In Mi's view, the lawsuit between Getty Images and The New York Times may be a "fight to promote talks", and the purpose is not entirely about winning or losing in court, but through legal action to get the two parties to reach an agreement on copyright use, data licensing and other issues, and promote cooperation between platforms and content producers.
This practice is not uncommon in the content industry. A typical example is the game between the domestic short** platform and the long** platform, not so much that the long** platform is to prohibit the "second creation" behavior of fans of platforms such as Douyin and Station B, but rather to establish a cooperative relationship with the short** platform.
In terms of the legal provisions on AIGC, Mi Xinlei believes that the direction of each country is still "continuous optimization", and some suggestions and guidance on the use of AIGC are made, or local adjustments are made. In general, the development of the AIGC industry is still encouraged. Practitioners should focus on the lawfulness of data use, personal information protection, copyright issues, and compliance with AIGC-specific regulatory requirements. These areas are usually the places where the hidden legal dangers are most concentrated, and they are also the underlying logic of the supervision of the relevant departments.
The following is a partial transcript of the conversation between Shenxiang and Mi Xinlei, a partner at Jincheng Tongda & Neal Law Firm:
Q: As a legal practitioner, what is the difference between looking at these AI-generated content and the concerns of ordinary people? What are the legal risks that come to mind?
A: I've been studying copyright law for a long time, and I've been paying attention to this area, so my first reaction was about the fair use of copyrighted content. Because its principle is three steps, the first step is to do data collection, after collecting data, and then carry out model training, according to the user's needs to train almost, and then the user uses one of their own keywords to input, through the keyword continuous adjustment and optimization, and finally generate a product.
The first step is the core, if the data is scraped and collected without the consent of (the other party), then it is actually a bit of a gray area, because your data is relatively large, the collection is relatively extensive, and then knead it together, how to prevent his infringement?
Q: How do platform developers ensure that the data they use to train AI models is legally authorized? Is there an express legal provision to protect this right and interest?
A: Data is a very important asset, under normal circumstances, you should get his (the other party's**) authorization, sign an authorization agreement with him, and then pay him the authorization fee, and then he may provide you with an interface, and then let you capture the data. But when it is not disclosed, you have to forcibly capture it, or you have to obtain it through illegal means, which may violate the Data Security Law, Personal Information Protection or Anti-Unfair Competition Law, as well as the relevant provisions of the contract section of the Civil Code. So at this level, there are actually a series of laws and regulations that can be regulated. Q: In the future, with the prosperity of AIGC, will each country produce a law and regulations that clearly stipulate that you want to disclose your data**? A: Not necessarily, at present, from the perspective of industry development, the development of the AIGC industry is generally encouraged, because it may be a revolutionary technology. Judging from the current laws and regulations of various countries, I think they are constantly optimizing, or making some suggestions and guidance for some methods of use, or some local adjustments. But in terms of the general trend, he doesn't have fundamental blocking policies like, for example, restricting your database to open source and database disclosure. In the era of the information revolution, data is a core asset and a battleground, and it is unlikely that all of them will enter the free public domain.
Q: How should the copyright of users be protected when using AIGC? Does the user have to accept that it could be propagated and used by others as a result?
A: If you put the article on the Internet, it doesn't mean that others can use it directly, in fact, many of them are now published on the Internet, including some** are also on the Internet, you can't say that they are published on the Internet, and then make it public that others use it is not infringing. Although the platform will have similar exemption clauses, it will ultimately be an infringement from a legal point of view. For users, I've used some of these AIGC platforms to generate relevant content and upload it online, which is also copyrighted. Q: Is there a game of interest between the platform and AIGC users? For example, who owns the copyright of the generated product? Who is responsible for violating the law?
A: Theoretically, yes, but I don't think that's exactly the case from a practical point of view.
For the platform, he makes an AIGC tool, it is intended for users to use, its purpose is commercial, either I get traffic, or I get users, and then get more input, so he is a service-oriented agency. In fact, he is not very good at fighting with users for the copyright of user-generated products, which is meaningless to him. From this point of view, they are not opposites, and this should be able to be clearly divided, as long as it is clear in the user agreement, according to your (user's) tuning and training, the final product copyright belongs to you, if there is any responsibility, it is also the user's responsibility. It is more important for the platform to want more users to use my (app), which is to make money from this, rather than by generating things to sell for money.
In particular, openai even considers that if the user faces a lawsuit, we (the platform side) will pay you, or even launch such a product, so that the user will not have to worry. Moreover, he (the platform) must also want to minimize this situation, and they are actually constantly optimizing to reduce the risk of infringement in the final result.
Q: Have you ever been exposed to such cases in the field of AI? A: Although AI is relatively popular in the industry at present, it actually involves not so many cases, and there may only be about three or four in China. The first case was sued by a law firm in Beijing, and the plaintiff in the "AI Wenshengtu First Case" judged this year is actually a lawyer, so both cases have a bit of a public interest litigation nature, and they want to play a directional role in the rules of the industry by shaping the form of a classic legal case.
If the development of foreign industries is a little ahead of us, then it will expose a little more problems than us. For example, in some foreign cases, the plaintiff claims that the defendant scrapes data to train its own model for profit, and at the same time, the products it produces are highly similar to the original work, at this time, there are two types of people who will sue, the first type is the party that is scraped data, for example, in June 2023, a large number of consumers submitted a nearly 160-page indictment to the federal court in San Francisco, suing OpenAI's most popular ChatGPT and Dall-E for stealing private information. The stolen information came from hundreds of millions of Internet users, including children, and without the user's permission; At the same time, Microsoft Corp., which invested $10 billion in OpenAI) is also listed as a defendant.
The second type is the rights to the work, such as the Authors Guild and the New York Times, who directly produce the content, and if they directly use some of the things in the author's books in this (AIGC) process, then it is actually a bit of a manuscript wash, and the same is true for news reports. In addition, there is Getty Images, the largest producer in the United States, who buys from him for articles and news reports on the Internet. If the current AI can be grabbed for free, and then rub it for you to generate a new **, it is equivalent to directly moving its interests.
At the same time, the copyright office of the United States should also be stricter in reviewing the copyright registration, he will ask how you generated it, if you say that it is generated by AI, it will not be registered, which is the attitude of the American administrative level.
Therefore, these lawsuits in the United States are closer to the current game of industrial ecology. Because it really touches the interests of giants. The U.S. will also give a judicial attitude through case judgments.
Judging from foreign experience, the purpose of their lawsuit may be to promote negotiations, and the purpose is not to say that this lawsuit must be fought, but to say that through the lawsuit, if you get my authorization, it is equivalent to you paying me a license fee, and finally we will establish a cooperative relationship. This is a bit like the battle between the short**platform and the long** platform in China in the first two years, Douyin and Station B have a lot of up owners who use the resources of movies to do the second creation, which is actually an infringement, so the long-term ** platforms of iQiyi, including some content producers and some old film companies, are jointly speaking out, and they want to sue or ask them to pay copyright fees, in fact, they also want to establish a cooperative relationship between the short**platform and the long**platform. Q: What are the possible development trends of AIGC-related laws in the future?
A: In the legislative plan, the draft of the Artificial Intelligence Law is actually in the process of drafting, but it is not yet known when it will be implemented. In fact, there is a certain lag in legislation, and the industry is still in the process of development, and there are still many problems that have not yet appeared. When the future of the industry is not clear, hasty legislation is not necessarily a good thing.
Although AIGC is related to artificial intelligence, some of the disputes arising from its essence are copyright, intellectual property rights, data security, personal information protection, etc. In response to these, under the existing legal framework, we also have previous laws and regulations, such as the Copyright Law, Personal Information Protection** and Data Security Law. It's enough to deal with it, but it's not ready to be rolled out yet.
We need to be cautious about these new cases, but that doesn't mean we have to legislate to frame them in a proper way, which will inhibit the development of technology. Q: For practitioners in the field of AIGC, what are the relevant legal experience suggestions that can help practitioners avoid legal hidden dangers as much as possible?
A: In fact, the bigger risks are concentrated on the platform side.
The first suggestion is to pay close attention to the policies of domestic regulatory authorities, especially the regulations related to generative AI, which were issued by six or seven departments together, and the intensity is relatively large, and there are many regulatory departments involved. Once the regulation is strengthened, you can't do it compliantly, and it's easy to have problems.
The second is to pay attention to compliance issues, and there are currently some specific requirements for AIGC. For example, it is required that AI-generated content be identified, and you have to let everyone know that this is AI-generated things, such as digital human anchors, etc., without any reminders, which are already available on platforms such as Douyin B station, so you should pay attention to compliance risks. The main focus is on the data aspect, personal information aspect, and copyright aspect, which constitute the underlying regulatory logic of the industry.