Who owns the large model generation work?Will my data be used as training "nourishment"?
Author: IT Times reporter Shen Yibin
Edited by Qian Lifu and Sun Yan
Recently, the first-instance judgment of the first domestic AIGC case was announced. The court ruled that the plaintiff enjoyed the copyright involved in the case, and the defendant's behavior constituted infringement and should bear the corresponding legal responsibility.
At the beginning of this year, plaintiff Li used the open-source software Stable Diffusion to generate ** by entering prompt words and posted it on social platforms. Later, Li found that Liu used the ** in the article of Liu's Baijiahao, so he filed a lawsuit.
Stable Diffusion gives the copyright to the user in full possession, which is one of the reasons why the plaintiff was able to win the lawsuit. However, not all large model companies are willing to hand over the copyright of their creations to users. So, who exactly owns the copyright for user-generated?
The turmoil doesn't stop there. Recently, WPS's privacy clause of "documents and materials uploaded by users are used as basic materials for AI training after desensitization" has attracted attention from the outside world. Under pressure, WPS subsequently changed its privacy terms. However, there are still similar provisions in the privacy clauses of many large models, which raises concerns among users that their data will be used as "nourishment" for training large models
Who owns the copyright of the generated work?In the face of such a question from the reporter of the "IT Times", the large models gave different answers.
The Zhipu Qingyan model replied, "The copyright of the content you generate based on Zhipu Qingyan is maintained by you and used after independent judgment. If the copyright problem causes losses to the user, Zhipu Huazhang Company will not be responsible, but the user has the right to recover from the user if the user causes losses to Zhipu Huazhang Company. ”
Wen Xin Yiyan's answer showed a "Tai Chi style": "At present, there is no unified answer in the legal academic and judicial circles, and it is recommended to consult legal professionals." The same is true for SenseTime, which says that "copyright depends on the specific agreement or terms of use between the user and SenseTime." ”
iFLYTEK Xinghuo and Minimax clearly answered: "The copyright belongs to the company. ”
The answer is very different and has to do with the regulations on the intellectual property rights of AI products in each large model.
iFLYTEK Xinghuo in the "User Experience Rules" 2Article 2 states: "You do not have the right to copy, distribute, transfer, rent, lend, license, transfer, make available to others for use or in any commercial way the content generated by the Service (Output Content) without our written consent." ”4.Article 2 also stipulates: "Unless there is proof to the contrary, your use of the services of this platform to upload, publish or transmit content means that the user irrevocably grants iFLYTEK and its affiliates a non-exclusive, unrestricted, permanent and free license to use (including storage, use, copying, revision, editing, publishing, displaying, translating, commercial or non-commercial use, such as distributing the above content or making derivative works) and the right to sublicense the use of such content to third parties, as well as the right to obtain evidence and bring lawsuits against third parties for infringement in their own name. ”
This can be understood as the text, ** and other information uploaded by users on iFLYTEK Xinghuo can be used by the company for commercial or non-commercial useHowever, when users want to use their own generated products, they must obtain the written consent of iFLYTEK.
The relevant terms and conditions of Wenxin Yiyan indicate that the intellectual property rights of the content (including but not limited to software, technology, programs, user interfaces, etc.) provided in the Wenxin Yiyan service belong to the intellectual property rights, and the copyright of the content entered by the user will not be transferred due to upload and publication. However, the copyright of the output work is not specified.
Zhipu Qingyan stated in the terms: "The copyright of the content you generate based on Zhipu Qingyan shall be maintained by you and used after independent judgment. ”
I prefer that the copyright belongs to the user. In the face of different large-scale "user agreements", Liu Songhui, an intellectual property lawyer at Beijing Haotian Law Firm, said.
In Liu Songhui's view, whether the copyright belongs to the large model enterprise or the user needs to be analyzed on a case-by-case basis. The key point is whether the product is original and the result of human intelligence. "As far as originality is concerned, if you simply ask the model to write a poem or plan a travel plan, it does not meet the above requirements and is not protected by law;If the creator provides a sufficient content outline or template, and repeatedly adjusts and polishes the text fed back by the large model, so that the AI-generated product reflects the creator's personalized choice, judgment, arrangement, and skills, etc., then it may have the originality requirements of the work and can be protected by relevant laws. Liu Songhui said.
Zhu Wei, deputy director of the Communication Law Research Center of China University of Political Science and Law, said in an interview that for the determination of the copyright of AI-generated content, the user agreement of the enterprise is considered from its own point of view, which does not affect the judgment of the law. He believes that it is still in the era of weak artificial intelligence, and the existing copyright law can be adapted to the generation of large models, and only in the era of strong artificial intelligence can there be greater controversy over copyright. ”
Just a few days before the disclosure of the results of the first instance of the first case of AI-generated ** copyright infringement, late at night on November 18, WPS's apology statement attracted attention.
The reason for the apology is that WPS previously stipulated in the privacy policy: "We will use the document materials uploaded by you as the basic materials for AI training after desensitization." ”
This clause caused an uproar, and WPS subsequently issued an apology letter and updated the privacy policy, promising that all user documentation will not be used for any AI training purposes, nor will it be used in any scenario without the user's consent, and through a third-party independent agency, the privacy policy compliance review will be regularly conducted to ensure that the promise is fulfilled.
Although WPS has changed the privacy terms, the reporter found that many large model companies have similar expressions in their privacy policies: "When the law is applicable, we will de-identify personal information through technical means, and may use the data for model algorithm training".
iFLYTEK Xinghuo's "Data Processing Instructions" in the "iFLYTEK Open Platform Privacy Agreement" mentions: "In accordance with applicable laws and regulations, iFLYTEK Xinghuo may carry out technical processing of users' personal information, and conduct anonymized or de-identified academic research or statistical analysis of the processed information (including the use of anonymized or de-identified voice information for model algorithm training) to better improve product functions and service capabilities." ”
Wenxin Yiyan stated in the Personal Information Protection Rules: "After the personal information will be de-identified through technical means, the de-identified information will not be able to identify the subject. In this case, Wenxin Yiyan has the right to use the de-identified information;On the premise of not disclosing the user's personal information, the user has the right to analyze the database of participants and use it commercially. ”
In addition, Doubao, Zhipu Qingyan, Baichuan Intelligence, SenseTime, etc. also all mention de-identification and anonymization, as well as the right to use the processed data for large model training and commercialization when the law is applicable.
So can these de-identified, anonymized, desensitized and other processed data be used by enterprises without the consent of the individual?
From a legal point of view, de-identified and anonymized personal information and data are no longer owned by individuals because the authenticity of the data cannot be guaranteed. He Siyuan, a lawyer at Qiabao Law Firm, explained to the "IT Times" reporter that according to the "People's Republic of China Personal Information Protection", personal information is a variety of information related to identified or identifiable natural persons recorded electronically or by other means, excluding anonymized information. According to the Data Security Law, data processing includes collection, storage, use, processing, disclosure, etc., and processors are not allowed to steal or obtain data in other illegal ways during the information processing process.
However, in practice, users do not know whether the information processing is clean and legal.
You Yunting, an intellectual property lawyer at Shanghai Dabang Law Firm, said in an interview with ** that the data used in the training of large models is a black box, and users may not know that they have been infringed, and even if they do, it is difficult to collect evidence and provide evidence. Therefore, he believes that large model platforms should design an automatic filtering mechanism to filter out personal information at the stage when users upload information. This should also be the responsibility and legal obligation of the platform.
Looking at the user agreements and privacy policies of several large models on the line, the only one that mentions the filtering mechanism is minimax. It states, "Minimax will not bind or establish any association between the user's conversation and the personal information when registering or applying for the open platform, and will improve the filtering mechanism of the service, especially the content of the conversation with the personal information will be filtered, deleted and not saved." ”
In this way, the privacy policy of the large model requires users to "agree to the service, otherwise they will not be able to enjoy the service", while for itself, it is "possible to use the processed data for large model training and commercialization". In the face of such an asymmetrical agreement, users who want to protect their personal information from being used as training data for large models can only disclose as little privacy as possible when using large model conversations, and can also request enterprises to withdraw the use of their own data in accordance with the provisions of the Privacy Policy.
Typesetting Ji Jiaying.
vittimes