The New York Times reported that OpenAI s GPT 4 output was heavily duplicated with its work

Mondo International Updated on 2024-01-31

The power of the machine reports.

Editor: Jiaqi, **Chicken

In the wave of artificial intelligence, defend your rights.

When artificial intelligence continues to rise, in addition to the powerful technology that amazes the four seats, there are also technical doubts and various normative problems.

What is used as training data?Is it licensed?Does the generated content create infringement?These questions have become a must-ask question on the road to the development of artificial intelligence. The relevant cases involved will also guide judicial practice in the future.

According to Bloomberg, The New York Times has sued Microsoft and OpenAI for copyright infringement and illegal use of The New York Times content for AI development. This lawsuit has forced people to face up to the relationship between ** and disruptive technology.

According to the New York Times indictment, the tech companies used millions of unauthorized copyrighted articles to train chatbots like ChatGPT, which are becoming jobs for people to get reliable information and, in turn, for news.

The New York Times did not specify the amount claimed, but noted that the defendants were responsible for the illegal copying and use of the New York Times' unique and valuable work for billions of dollars in statutory and physical damages, and demanded that the two companies destroy all chatbot models and training data that used the New York Times' copyrighted material.

At a time when most newspapers and magazines are struggling as readers flock to the Internet, The New York Times is one of the few companies that has successfully built a business model in the news business. And in the era of generative AI, traditional** faces new challenges.

In the year since ChatGPT debuted, there has been a lot of criticism and skepticism about it scraping text from the web as training data. In September, OpenAI was accused by the Writers Guild of America that ChatGPT was involved in a massive and systematic theft. The New York Times' indictment is the first time OpenAI has been challenged by a mainstream company. OpenAI had sought authorization from copyright holders, as Google and Meta did with Facebook. The New York Times allegedly reached out to Microsoft and OpenAI in April, but was unable to reach an agreement.

If Microsoft and OpenAI want to use our work for commercial purposes, they are required by law to first get our permission, a New York Times spokesperson said in an emailed statement, but they didn't. 」

In a statement, an OpenAI spokesperson said: "We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models. Our ongoing dialogue with The New York Times has been fruitful and has been actively promoting, and as a result, we are surprised and disappointed to be prosecuted. Microsoft declined to comment.

In July, OpenAI signed an agreement with the Associated Press to acquire the rights to some of the news organization's archives. In December, OpenAI signed a three-year agreement with Axel Springer SE to use the work of the German** company.

On Wednesday, an openAI spokesperson said: We want to find a mutually beneficial way to work hand in hand, like we do with many other publishers. 」

Even so, OpenAI has been the target of multiple lawsuits, with content producers complaining that their work was inappropriately used for AI training. The company faces comedian Sarah Silverman, Game of Thrones author George Rr.Class action lawsuits by cultural luminaries such as Martin and Pulitzer Prize-winning author Michael Chabon.

The New York Times' odds

Cecilia Ziniti, chief counsel for several tech companies, sums up the New York Times' winning points. She called the lawsuit the best example to date of allegations of copyright infringement by generative AI.

First, the complaint clearly pointed out the defendant's infringement. The defendant had access to the original work, and there was a substantial similarity between the work between the plaintiff and the defendant. These two points are the key to determining whether there is an infringement. The New York Times, the largest proprietary dataset in the Common Crawl used to train GPT, proves that ChatGPT's output and the content of the New York Times are both accessible and substantially similar.

Secondly, the complaint provides evidence of plagiarism that can be understood at a glance. In the image below, the red is the exact coincidence, and the black is the newly generated text from GPT, and a large number of repetitions will be clear at a glance. Ziniti argues that OpenAI is powerless to defend itself unless it makes a major overhaul of the way it trains GPT, or explains the technical principles through a large number of legal means. It is wiser to choose reconciliation than to continue confrontation.

The New York Times' cleverness also lies in the fact that they highlight the original process of a news story. Behind an in-depth report on taxi loans was the tireless effort of reporters who traveled around and interviewed more than 600 times. Although the copyright law does not protect the sweat on the forehead of the laborer, it protects the wisdom and creativity of the original creator. Compared to GitHub Copilot, when it was taken to court by more than 10 million programmers, who only cited a few lines of open source**, this protection of innovation is not so convincing. (Note: Sweat on the forehead is a well-known case of copyright protection, and the U.S. Supreme Court held in the Feist case that only the sweat on the forehead of human beings should be protected, labor, and this line of thinking would undermine the basic principles of copyright law).

In addition, in April, OpenAI obtained authorization from Politico and other ** after negotiations broke down with The New York Times, which caused damage to the interests of the New York Times. As OpenAI's market capitalization grows and more cases of plagiarism emerge, refusing to settle with the New York Times could cost OpenAI dearly. Ziniti made a bold guess about the April talks: the OpenAI side thinks they could get out of it with millions or tens of millions of dollars, while The New York Times wants more, as well as ongoing royalties.

According to the analysis, CloseAI's unfavorable image will also have an impact on OpenAI's face of accusations. The New York Times portrays OpenAI as a for-profit, closed-source organization, in stark contrast to journalism that serves the public interest. Adjudication needs to weigh the social benefits of copyright protection against technological innovation. In copyright cases, the struggle between justice and ** has always been the focus of controversy. And narratives that represent the side of justice tend to work more easily in court. The complaint also mentions the palace fight drama between the OpenAI board of directors and Sam Altman, and it is not known if the series has dusted OpenAI's image.

Finally, people's fear of the problem of hallucinations of large models can make the case even more turbulent. The New York Times accused Bing of publishing an article titled "Orange Juice Causes Lymphoma," but the New York Times never actually wrote about it. This undoubtedly puts OpenAI at an even greater disadvantage.

The lawsuit could be a turning point in the field of AI and copyright.

Aftermath

OpenAI is currently in talks with investors for a new round of funding that will reach a valuation of $100 billion, which would make it the second-most valuable startup in the United States, as reported by Bloomberg last week.

Microsoft is the biggest proponent of OpenAI and has deployed the startup's AI tools across several of its products. In the lawsuit, the New York Times alleges that Microsoft plagiarized the paper's articles verbatim in its Bing search engine and used OpenAI's technology to boost its value by a trillion dollars.

Since the debut of ChatGPT in November 2022, Microsoft's stock price has increased by 55% and its market capitalization has increased to 2$8 trillion. On Wednesday, Microsoft shares were little changed, trading at 374 in New York$07**. Next, it is unclear whether Microsoft's stock price will change significantly.

abacus.AI CEO Bindu Reddy posted that perhaps the biggest benefit in the end is a product like Gork. After all, by allowing users to post content on their platform, they gain the right to train AI models with that content.

In fact, such problems also exist in the field of literary graphics. Some users even said that they no longer want to see recommendations for AI-generated images, and define AI painting with corpse stitching.

In fact, the Bunsen Diagram tool All e 3 was released with special attention to security and copyright issues. OpenAI, to avoid lawsuits like Stability AI and Midjourney, allows artists to remove their artwork from text-to-image AI models and not use it for training. Creators can submit a copy of the copyright they own and request to have it removed. However, it is still questionable whether such measures can fully protect the rights of creators from being infringed.

Reference Links:

Related Pages