Definition
An attack that hijacks an AI agent's behavior or decision-making by injecting malicious instructions into data streams the agent consumes. For example, a fake error report sent to an AI coding agent could trick it into running attacker-supplied code, or a malicious Sentry notification could override the agent's intended workflow.
Why it matters
Agentjacking exploits the implicit trust agents place in data sources they consume. Unlike prompt injection (which attacks the LLM directly), agentjacking corrupts the agent's operational context, causing it to misbehave while believing it is following legitimate error signals or instructions.