AI Security

New hack uses prompt injection to corrupt Gemini’s long-term memory

Ars Technica Unknown April 10, 2025 0.3
New hack uses prompt injection to corrupt Gemini’s long-term memory
In the nascent field of AI hacking, indirect prompt injection has become a basic building block for inducing chatbots to exfiltrate sensitive data or perform other malicious actions. Developers of platforms such as Google's Gemini and OpenAI's ChatGPT are generally good at plugging these security holes, but hackers keep finding new ways to poke through them again and again. On Monday, researcher Johann Rehberger demonstrated a new way to override prompt injection defenses Google developers have built into Gemini—specifically, defenses that restrict the invocation of Google Workspace or other sensitive tools when processing untrusted data, such as incoming emails or shared documents. The result of Rehberger’s attack is the permanent planting of long-term memories that will be present in all future sessions, opening the potential for the chatbot to act on false information or instructions in perpetuity. Incurable gullibility More about the attack later. For now, here is a brief review of indirect prompt injections: Prompts in the context of large language models (LLMs) are instructions, provided either by the chatbot developers or by the person using the chatbot, to perform tasks, such as summarizing an email or drafting a reply. But what if this content contains a malicious instruction? It turns out that chatbots are so eager to follow instructions that they often take their orders from such content, even though there was never an intention for it to act as a prompt. AI’s inherent tendency to see prompts everywhere has become the basis of the indirect prompt injection, perhaps the most basic building block in the young chatbot hacking canon. Bot developers have been playing whack-a-mole ever since. Last August, Rehberger demonstrated how a malicious email or shared document could cause Microsoft Copilot to search a target’s inbox for sensitive emails and send its secrets to an attacker. With few effective means for curbing the underlying gullibility of chatbots, developers have primarily resorted to mitigations. Microsoft never said how it mitigated the Copilot vulnerability and didn't answer questions asking for these details. While the specific attack Rehberger devised no longer worked, indirect prompt injection still did.
Share
Related Articles
Critical Vulnerability Discovered in Popular AI Development Framework

A critical vulnerability in DeepLearn AI framework could allow attackers to...

October 24, 2025 Read
3 takeaways from red teaming 100 generative AI products | Microsoft Security Blog

The growing sophistication of AI systems and Microsoft’s increasing...

April 11, 2025 Read
New Defense Against Adversarial Attacks Demonstrates 90% Effectiveness

A new defense against adversarial attacks on computer vision systems shows...

April 10, 2025 Read
Using ChatGPT to make fake social media posts backfires on bad actors

OpenAI claims cyber threats are easier to detect when attackers use ChatGPT.

April 09, 2025 Read
AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt

Attackers explain how an anti-spam defense became an AI weapon.

April 07, 2025 Read