New hack uses prompt injection to corrupt Gemini’s long-term memory

In the nascent field of AI hacking, indirect prompt injection has become a basic building block for inducing chatbots to exfiltrate sensitive data or perform other malicious actions. Developers of platforms such as Google's Gemini and OpenAI's ChatGPT are generally good at plugging these security holes, but hackers keep finding new ways to poke through them again and again. On Monday, researcher Johann Rehberger demonstrated a new way to override prompt injection defenses Google developers have built into Gemini—specifically, defenses that restrict the invocation of Google Workspace or other sensitive tools when processing untrusted data, such as incoming emails or shared documents. The result of Rehberger’s attack is the permanent planting of long-term memories that will be present in all future sessions, opening the potential for the chatbot to act on false information or instructions in perpetuity. Incurable gullibility More about the attack later. For now, here is a brief review of indirect prompt injections: Prompts in the context of large language models (LLMs) are instructions, provided either by the chatbot developers or by the person using the chatbot, to perform tasks, such as summarizing an email or drafting a reply. But what if this content contains a malicious instruction? It turns out that chatbots are so eager to follow instructions that they often take their orders from such content, even though there was never an intention for it to act as a prompt. AI’s inherent tendency to see prompts everywhere has become the basis of the indirect prompt injection, perhaps the most basic building block in the young chatbot hacking canon. Bot developers have been playing whack-a-mole ever since. Last August, Rehberger demonstrated how a malicious email or shared document could cause Microsoft Copilot to search a target’s inbox for sensitive emails and send its secrets to an attacker. With few effective means for curbing the underlying gullibility of chatbots, developers have primarily resorted to mitigations. Microsoft never said how it mitigated the Copilot vulnerability and didn't answer questions asking for these details. While the specific attack Rehberger devised no longer worked, indirect prompt injection still did.

New hack uses prompt injection to corrupt Gemini’s long-term memory

Summary

Quick Actions

Share

Related Articles

New hack uses prompt injection to corrupt Gemini’s long-term memory

Summary

Quick Actions

Share

Related Articles

Critical Vulnerability Discovered in Popular AI Development Framework

3 takeaways from red teaming 100 generative AI products | Microsoft Security Blog

New Defense Against Adversarial Attacks Demonstrates 90% Effectiveness

Using ChatGPT to make fake social media posts backfires on bad actors

AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt