According to 404 media, a team of researchers led by scientists at Google's Deepmind cleverly used a cunning method to discover ** numbers and email addresses through OpenAI's ChatGPT. This finding raises concerns that ChatGPT's training dataset contains a large amount of private data, hinting at the risk of inadvertently exposing information.
The researchers expressed surprise at the success of their attack and stressed that the vulnerabilities they exploited could have been caught earlier. They detail their findings in a research report that is currently not peer-reviewed. The researchers also mentioned that, to their knowledge, no one had observed a significant frequency of ChatGPT sending training data prior to the publication of this article.
Of course, the leakage of potentially sensitive information is only a small part of the current problem. The broader problem, as highlighted by the researchers, lies in the fact that ChatGPT unconsciously copies large amounts of training data verbatim at an alarming rate. This vulnerability opens the door to extensive data extraction, potentially supporting the claims of authors who believe their research has been plagiarized.
The researchers admit that the attack is very simple and somewhat interesting. To carry out an attack, simply instruct the chatbot to endlessly repeat a specific word, such as:"poem"and let the chatbot do what it has to do. After a while, instead of repeating, ChatGPT began to generate a wide variety of mixed texts, which often contained a lot of content copied from the web.
On November 30, 2022, OpenAI launched ChatGPT (Chat Generation Pre-trained Converter) to the public. The chatbot is built on a powerful language model that enables users to shape and steer conversations based on their preferences for length, format, style, level of detail, and language.
According to the Nemertes 2023-24 Enterprise AI Research Report, more than 60% of companies surveyed are actively adopting AI in production, and nearly 80% have integrated AI into their business operations. Surprisingly, less than 36% of these organizations have a comprehensive policy framework in place to govern the use of generative AI.