Zhang Yiming s road, will OpenAI make sense?

Mondo Health Updated on 2024-02-01

The mutual pinching between OpenAI and the New York Times is becoming more and more interesting.

On January 9, local time, OpenAI finally broke its silence for nearly two weeks and publicly published a long article in response to the accusations of the New York Times. On December 27 last year, the New York Times sued ChatGPT maker OpenAI and its partner Microsoft in the United States, accusing the latter of using millions of articles to train AI without permission.

This time, OpenAI's response was not a boiled water PR wording, but a sharp point pointing out that the New York Times did not tell the full story, and there was a suspicion of deliberately manipulating the results of ChatGPT's answers, and the entire lawsuit was groundless.

On one side is ChatGPT, which represents new technology, and on the other hand, the New York Times, which represents the old news organization, and the two are facing each other in court, which is originally destined to be written into the history of science and technology. Now OpenAI's response is quite "head iron", which adds fuel to the fire.

Looking back, whether it is radio, television, the Internet, or the emergence of new media, there will be a conflict with the interests of content copyright holders, especially journalism.

Exactly 10 years ago, in China, the rising Toutiao was also sued by "Guangzhou**", and then a number of news organizations and portals followed up, and there was a momentum of attacking it in groups. The conflict between the two sides of the incident is the same as that between AI and news today.

The dispute finally subsided with Toutiao's vigorous purchase of copyrights, and "cooperation" was the path posed by Zhang Yiming. Two years later, Toutiao has covered more than 3,700 companies, and invests more than 1.5 billion yuan in copyright purchasers every year.

Coincidentally, OpenAI is still holding high the banner of "cooperation". In addition to his unceremonious response to The New York Times, he also emphasized the "exitable" principle and a strong willingness to work with news organizations.

But this time, the New York Times is only going to be more cautious — until now, social media platforms like Facebook, search engines like Google and journalism have not come to an agreement, and journalism wants more from the platforms, and the platforms are reluctant to go along easily.

OpenAI throws out the "pie" of cooperation, and the "New York Times" may not eat it easily.

Both aopenai and the New York Times clenched their fists.

Since the launch of ChatGPT at the end of 2022, OpenAI has suffered a lot of copyright lawsuits. In September last year, more than a dozen writers filed a lawsuit against OpenAI, and a few months later, in December, 11 more American writers sued OpenAI and Microsoft in Manhattan federal court in New York.

But the New York Times's complaint carries a different weight after all. First, the New York Times itself is one of the most mainstream and largest veterans in the West, and second, the New York Times' prosecution is menacing.

Suing OpenAI, The New York Times submitted 22,000 pages of attachments and pleadings to the court in one go, including as many as 100 key pieces of evidence against ChatGPT's infringement, showing that the content output of ChatGPT is highly similar to that of The New York Times.

In a typical piece of evidence, the output of GPT-4 is on the left, and the original New York Times text is on the right, and the overlapping text is shown in red, like the "color palette" skill used every time a "hammer" plagiarism is made on the Chinese Internet.

According to the complaint, the New York Times article alone constitutes the largest single proprietary dataset used to train GPT in Common Crawl (Common Crawl is a society that has archived nearly the entire network for 16 years). The New York Times demanded that OpenAI and Microsoft destroy the model and training data containing the infringing material, without specifying the amount of the claim, but said the defendants should be held liable for "billions of dollars in statutory and physical damages" associated with the illegal copying and use of The New York Times' uniquely valuable work.

In addition, the New York Times also pointed out that due to AI "hallucinations", ChatGPT sometimes claims that some fake news and rumors are from the New York Times, causing damage to its reputation.

The New York Times came prepared, punched hard, and on the day of the appeal, it also published its own high-profile report, which caught OpenAI off guard. OpenAI later said that it had been communicating with OpenAI on copyright issues in December, but it was a slap in the face that the other party changed hands.

When it came to expressing its stance again, OpenAI was not polite, and sent a long article throwing out four key points:1. OpenAI is willing to cooperate with news organizations and create new opportunities; 2. It is reasonable to use publicly available Internet materials to train AI models, but OpenAI still provides an exit mechanism; 3. Reggitate facts are indeed a rare error, and OpenAI is working to reduce it to zero; 4. The New York Times did not tell the story in its entirety, and its lawsuit was baseless.

The "rumination" mentioned in this refers to the AI "spitting out" the training material as it is, as listed in the New York Times, and the AI's answers are almost verbatim with the New York Times article. OpenAI's position is that the phenomenon of "rumination" does exist, but OpenAI has reduced its degree to a very low level, and it is very suspicious that the New York Times has come up with hundreds of examples of "rumination" at once.

As a result, OpenAI suspects: "Interestingly, the rumination mentioned by the New York Times appears to have come from articles from many years ago that were heavily circulated on multiple third-party **s. They seem to be deliberately manipulating the prompts, often including lengthy excerpts from articles, in order to get our models to regurgitate. Even with such hints, our models often don't behave as the New York Times suggests, suggesting that they either instruct the model to ruminate or carefully pick examples from multiple attempts. ”

To sum it up: my child steals? I think you stuffed the children in their hands and planted them, right?

In addition to this, there are two other points worth playing in OpenAI's response.

First, OpenAI emphasized the "exit" mechanism, noting that The New York Times had already adopted a rollout process back in August last year. In fact, many mainstream news**, including the New York Times, Reuters, CNN, etc., have blocked OpenAI's GPTbot web crawler since last year to restrict its continued access to these ** content.

Second, OpenAI's "murderous heart" negates the importance of the New York Times** in ChatGPT training: "Since the model learns from a huge collection of human knowledge, any one department (including the news) is only a small part of the overall training data, and any single data source (including the New York Times) is not important to the expected learning of the model." ”

I'm not, I don't, don't talk nonsense", it's just right to put it on openai.

Since AI is already the trend of the future, and OpenAI is also willing to cooperate, why does the New York Times make a big move?

30% of artificial intelligence is in journalism. Let's stop making the same mistake and give everything for free again. "Our content is being stolen, and we have to say: not this time. The Innovation World Report 2023 reads.

"Don't make the same mistake"., similar wording, which was heard when OpenAI's CEO, Sam Altman, sat on the bench before the US Congress. At that time, members of Congress expressed regret several times, saying that they could not repeat the mistakes of the social ** era. In the era of social **, regulation has been far behind the development of technology, and it has been 14 years since Facebook was launched when Zuckerberg first sat on the bench for a congressional hearing in 2018 for the "Cambridge scandal".

From a certain point of view, OpenAI is indeed standing on the shoulders of giants - with the lessons of the past, ChatGPT became famous and immediately attracted vigilance from all sides.

The New York Times doesn't want to repeat the mistakes of the past, either. In the era when search engines and social networking have become traffic entrances, the traditional ** has been difficult to transform, and it has also reached "cooperation" with large technology platforms, but later it feels that it is not "worth it".

Facebook has been cooperating with the traditional ** very early, and the "New York Times" is also the first to settle in**, and the cooperation model at that time was profit sharing, and the distribution was completed on Facebook's platform. But with Facebook and Google's parent companies receiving 60% of U.S. digital advertising revenue in 2018, agencies are starting to feel like they're being taken away too much and getting too little.

In 2019, the New York Times published a report: the annual digital advertising revenue of the U.S. journalism industry is $5.1 billion, while the digital advertising revenue from Google's aggregator news service is $4.7 billion.

News publishers are striving for more benefits in many countries and regions. In 2020, Australia** became the first country to require Facebook and Google to pay for news content. In 2023, Canada also passed the Press Act, followed by an agreement between Google and the authorities agreeing to pay $74 million to Canadian news publishers. Meta, the maker of Facebook, refused to compromise and simply did not block news content in Canada. The U.S. "News Competition and Protection Bill" was also promoted in Congress, but it did not get a chance to vote unanimously.

Juan Cyno, founder of Innovation Consulting Group, who wrote the Innovation 2023 World Report, bluntly stated in his speech: "We can't build our own business on someone else's platform, whether it's Facebook or Google, big tech companies don't care about our interests. "They have their own interests, so why expect them to look after ours? Formalism prevails, but income is too small. ”

You know, the New York Times itself is a role model for the rebirth of the print media at a time when it was in decline, and after the subprime mortgage crisis in 2008, it mortgaged its headquarters building to borrow money, and even tried to buy it in many ways. With a major digital transformation and the introduction of a paid subscription model, The New York Times eventually turned a profit. In 2022, more than 60 percent of The New York Times' revenue came from paid subscriptions.

From this, it is not difficult to understand where the posture of the New York Times and OpenAI to "break the net" comes from:"Cooperation" is easy to say, but how can cooperation ensure that the original interests of the New York Times are not infringed upon and new business opportunities are not taken away? There are many question marks and few answers.

Take advantage of the New York Times' huge investment in reporting and hitchhike the news industry. The resentment of the New York Times is not only from the "fledgling" ChatGPT.

For OpenAI, this is destined to be an uphill battle.

In addition to the copyright battles that have erupted at many points, Europe has voted to pass the draft AI Act in June last year. According to the bill, vendors such as OpenAI are required to disclose a list of copyrighted data used in the process of training models.

Although the statement emphasized that the New York Times is "not important", the copyright content is still important for OpenAI's large-scale model training.

In a recent submission to the House of Lords Select Committee on Communications and Digital Affairs on a survey of large language models, OpenAI admitted that the development of AI tools like ChatGPT is inseparable from copyrighted material, and said that without these materials, GPT would not have been born at all: "Since current copyright covers almost all forms of human expression, including blog posts, forum posts, software snippets and files, if you don't use copyrighted content, It is impossible to train today's leading AI models. ”

While fighting with the New York Times, OpenAI is also actively promoting "cooperation" with the journalism industry, and has achieved some results.

In December, shortly before The New York Times sued OpenAI, OpenAI reached a partnership with German news and publishing giant Axel Springer. Springer is Europe's largest digital publishing company with well-known news brands including Business Insider and Le Monde.

The two parties signed a multi-year agreement, ChatGPT can provide users with a summary of the Springer news** in the reply, including the original source and link, to ensure that the news** gets traffic. At the same time, Springer's content will be used by OpenAI to train the model. Information quoted people familiar with the matter as saying that the deal is in the hundreds of billions of dollars.

This is the second major collaboration between OpenAI and news organizations, which reached a similar deal with the Associated Press in July of the same year for an undisclosed amount.

Competition will also further drive up the cost of newsgathering. In December, it was reported that Apple had reached agreements with a number of major publishers to collect its news content to train AI models. According to the report, Apple has approached NBC News, IAC and other institutions to propose a transaction of at least $50 million.

Just ticking the finger of "advertising share" has attracted the mainstream ** to rush in, and that kind of "good era" belongs to social ** and search engines. Today's openai have to draw bigger and more fragrant cakes.

References:

Krypton: The New York Times: Rising from the Crisis to the Top of the World

2. iWeekly Weekend Pictorial: "To Save Journalism, Google Agrees to Pay Canadian News Publishers".

3. Tencent Technology: "Facebook will push news tags and plan to spend millions of dollars to buy copyrights from **".

4. Jiemian News: "OpenAI has reached a cooperation with a giant in the publishing industry, can this deal bring evolution to the journalism industry?" 》

Beijing**: The mobile app "Today's Headlines" wantonly grabs news and falls into the whirlpool of infringement

Related Pages

    The bumpy road is to go by yourself

    Life,like a winding path,is full of ups and downs and challenges.Everyone is running around on this road,experiencing ups and downs,and feeling the up...

    The road of Google TPU is worth a trip

    On December ,Google s AI model Gemini made a stunning appearance,recognizing not only a blue hand drawn duckling,but also a man who leaned back and do...

    Every step of the way in life counts

    Autumn and Winter Check in Challenge The path of life is like an endless river,and everyone walks in this river.Every step is a wonderful and precious...

    Why did Zhang Yiming quit ByteDance

    From within the company to the whole society,Zhang Yiming s decision has attracted widespread attention and discussion.So,why did Zhang Yiming quit By...

    Why did Zhang Yiming quit ByteDance

    Why did Zhang Yiming quit ByteDance On May ,,Zhang Yiming issued an all staff letter,announcing his resignation as CEO of ByteDance.This makes one won...