Definition
An attack where an attacker replaces or modifies the core instructions (system prompt) that define an AI agent's behavior, goals, and constraints. Once the system prompt is overridden, the agent follows the attacker's instructions instead of the legitimate operator's.
Why it matters
The system prompt is the authority that tells an agent what it is, what it can do, and what rules it must follow. If an attacker can override it, they have complete control over the agent's behavior regardless of other defenses.