According to foreign ** reports, ChatGPT developer OpenAI recently said that when developing ChatGPT and other artificial intelligence tools, it uses copyrighted information. Without copyright protection, it would be "impossible" for these tools to add value. The statement also said that OpenAI made specific statements in the document submitted to the UK House of Lords Select Committee on Communications and Digital on Large Language Models.
The New York Times filed a copyright lawsuit demanding that OpenAI remove all instances of GPT
AI models such as ChatGPT and the image generator Dall-E gain their capabilities from training courses, some of which are served on large amounts of content scraped from the public internet without the permission of the copyright holder. OpenAI believes that in its specific application scenarios, some training content is authorized. This free-to-capture approach has long been a routine practice in academic research in the field of machine learning, but it has come under scrutiny due to the recent commercialization of deep learning and AI models.
"Because copyright today covers almost all human expressions — including blog posts, forum posts, software snippets and files — it would be impossible to train AI models as cutting-edge as they are today without the use of copyrighted material," OpenAI wrote in a Lords filing. ”
In addition, OpenAI writes that restricting training data to books and drawings in the public domain "created more than a century ago" will not provide an AI system that "meets the needs of today's citizens."
Last month, The New York Times filed a lawsuit against OpenAI and Microsoft, a significant investor in OpenAI, alleging that it illegally used NYT content in their respective products. OpenAI responded to the lawsuit on its ** on Monday, saying the lawsuit lacks legal merit and reaffirmed its support for journalism and partnerships with news organizations.
OpenAI's argument is that it is reasonable and legal to create AI models, and that the law allows limited use of copyrighted content without the owner's permission in specific circumstances. The company claims that copyright law does not prohibit the use of such materials to train AI models.
"The use of publicly available internet material to train AI models is fair use, supported by a long-standing and widely accepted precedent," OpenAI wrote in a blog post on Monday. "We believe this principle is fair to creators, necessary for innovators, and essential to improving the competitiveness of AI. ”
OpenAI refuted the copyright lawsuit claims, saying that every ChatGPT response is a derivative work
This isn't the first time OpenAI has claimed fair use of its AI training data. In August of this year, we also found a similar situation in ** report, where OpenAI defended comedian Sarah Silverman's use of publicly available material as fair use in response to a copyright lawsuit.
OpenAI claims that the authors of the lawsuit "misunderstood the scope of copyright and failed to take into account limitations and exceptions, including fair use, that leave appropriate space for innovations such as large language models at the frontiers of artificial intelligence." ”
Original link: