Security experts have created a novel AI worm in an isolated test environment that has the ability to propagate autonomously across multiple generative AI**s, potentially stealing data and spreading spam.
As generative AI systems such as OpenAI's ChatGPT and Google's Gemini continue to evolve and put into practical use, these systems are becoming increasingly intelligent and efficient. Startups and tech giants are building artificial intelligence** and ecosystems designed to help people with everyday chores, such as automating booking schedules and assisting with shopping decisions. However, as the authority of these tools expands, so do the potential security threats they face.
Recently, a team of researchers succeeded in creating what is claimed to be the first generative AI worm that can replicate itself. The worm's ability to spread between systems and potentially steal data or deploy malware along the way highlights the risks inherent in connected, autonomous AI ecosystems. "This marks the birth of a new type of cyberattack that has never been seen before," said Professor Ben Nassi of Cornell University of Science and Technology, the leader of the study.
Professor Nassi, together with his colleagues St** Cohen and Ron Bitton, developed the worm, called Morris II, as a tribute to the original Morris computer worm that caused a riot on the Internet in 1988. In a study** made available exclusively to Wired magazine, researchers show in detail how AI worms can attack generative AI email assistants, stealing data from emails and sending spam, while disrupting some of ChatGPT's and Gemini's security mechanisms.
Although the study was conducted in a controlled test environment and did not target publicly available email assistants, several experts warn that this is a security hazard for startups, developers and tech giants that cannot be ignored.
Most generative AI systems perform tasks by receiving prompts from user input that instruct tools to answer questions or generate images in the form of text instructions. However, these messages can also be exploited to attack the system. For example, with well-crafted prompts, an attacker can get the system to ignore its security rules and output content that is toxic or inflammatory. In addition, an instant injection attack can send secret instructions to the chatbot, causing the system to perform unauthorized actions.
To create AI worms with the ability to reproduce, the researchers employed a technique called "adversarial self-replication cues." Put simply, this is a prompt that triggers the generative AI model to output another prompt. AI systems are programmed to generate a further set of instructions in their responses, which has similarities with traditional SQL injection and buffer overflow attacks.
To demonstrate how the worm works, the researchers built a system that uses generative AI to send and receive emails, and integrates ChatGPT, Gemini, and other open-source large language models (LLMs) such as L**A. They found two ways to take advantage of the system: one by utilizing a text-based self-replicating prompt, and the other by embedding a self-replicating prompt in an image file.
In one case, the researcher composed as an attacker to compose an email containing an adversarial text prompt. This prompt "pollutes" the Email Assistant's database, causing it to pull additional data from outside the system during the retrieval and augmentation generation (RAG) process. When the email was used in response to a user's query and was sent to GPT-4 or Gemini to generate a response, it "jailbroken" the Genai service and stole data from the email. "The generated response, which contains sensitive user data, is then used to send back and forth emails to the new client and stored in the new client's database, infecting the new host," Professor NASSI explained. ”
In the second method, the researchers said, embedding images with malicious prompts made the email assistant send the message to someone else. "By encoding the self-replication prompt into the image, any type of image that contains spam, abusive material, or promotional content can be further sent to the new client after the initial email has been sent," Professor Nassi said.
In the study presentation, you can see the e-mail system multiple times. The researchers also noted that they were able to extract various data from emails. "This data can be names, ** numbers, credit card numbers, social security numbers, anything else that is considered confidential," Professor Nassi added.
While the study revealed some security vulnerabilities in systems such as ChatGPT and Gemini, the researchers highlighted that it was a warning about "poor architectural design" in the broader AI ecosystem. However, they have reported their findings to Google and OpenAI. "They seem to have found a way to exploit the Instant Injection vulnerability, which relies heavily on unchecked or filtered user input," said a spokesperson for OpenAI, adding that the company is working to improve the "resilience" of the system and advising developers to "take measures to ensure that harmful input is not used." Google declined to comment on the study, but Professor Nassi revealed that the company's researchers had requested a meeting to discuss the topic.
Although the worm demonstration was conducted in a relatively controlled environment, several of the security experts involved in the study said that the future risks of AI worms should not be ignored and that developers should take them very seriously. The risks are especially significant when AI applications are authorized to perform tasks such as sending emails or booking appointments on behalf of individuals, and they may be connected with other AI** to accomplish these tasks together.
Recently, security researchers from Singapore and China demonstrated a way to "jailbreak" a million LLMs in five minutes. Sahar Abdelnabi, a researcher at the CISPA Helmholtz Center for Information Security in Germany, participated in the Instant Injection Demonstration for LLMs in May 2023 and highlighted the potential for worm transmission. He points out that worms can spread when AI models get data from the outside or when AI can work autonomously. "I think it's very reasonable to generalize the concept of instant injection," Abdelnabi said, "but it's all about what kind of application these models are used for." Abdelnabi further noted that while this attack is currently a simulation, it may not be too long before it actually happens.
In an article about what they found, NASSI and other researchers, in the next two to three years, we will likely see spawnable AI worms in the real world. "The GenAI ecosystem is being developed at scale by many companies in the industry to integrate GenAI features into their cars, smartphones, and operating systems. *.
However, when it comes to creating innovative AI systems, there are ways to defend against potential worms, including the use of traditional security methods. "There are a lot of these problems, and proper security application design and monitoring is part of the solution," says Adam Swanda, a threat researcher at Robust Intelligence, an AI enterprise security firm. ”
Swanda also highlighted that human involvement is a key mitigation measure that can be implemented. "You don't want an LLM who is reading your email to be able to send an email at will. There should be a clear line between the two. For companies like Google and OpenAI, if a prompt is repeated thousands of times in their systems, Swanda said, it creates a lot of "noise" and is easy to detect.
Naxi and this study reiterate many of the same mitigation methods. Ultimately, NASSI said, the people who create AI assistants need to be aware of these risks. "You need to understand these risks and look at whether your company's ecosystem and application development are basically following one of these approaches," he said. Because if they do, they need to take those risks into account. ”