Attack  ·  Glossary

Agentjacking

An attack that hijacks an AI agent's behavior or decision-making by injecting malicious instructions into data streams the agent consumes. For example, a fake error report sent to an AI coding agent could trick it into running attacker-supplied code, or a malicious Sentry notification could override the agent's intended workflow.
Agentjacking exploits the implicit trust agents place in data sources they consume. Unlike prompt injection (which attacks the LLM directly), agentjacking corrupts the agent's operational context, causing it to misbehave while believing it is following legitimate error signals or instructions.
References
MITRE ATLAS — Adversarial Threat Landscape for AI SystemsOWASP LLM Top 10
Track this in the live feed See how this plays out in real AI security and governance developments.
Open the feed →