On the evening of May 22, an autonomous coding agent was partway through a job. It had listed a directory of trading-strategy code — backtest.py, optimizer.py, wfa.py — tried to run the program, read the error it threw, and packed all of that into a request to its model, asking what to do next. Its operator had typed one plain instruction: make a to-do list of what needs implementing.
The request never reached a model. It reached us. For that one turn of the agent’s loop, our honeypot was its brain, and whatever we sent back would become its next action.
We didn’t answer. But holding someone’s agent mid-thought makes a question impossible to unsee: when an autonomous agent asks an endpoint what to do next, what stops that endpoint from telling it to do anything at all?
You verify a software package by its signature and a website by its TLS certificate. There is no equivalent for a model’s response. Nothing proves the tool call your agent just acted on came from the model you think you’re talking to, rather than from whatever is answering at that address. The agent trusts the endpoint by construction; the pointer to it is usually a plain environment variable; no major provider signs what it sends back. “Don’t point your agent at an endpoint you don’t trust” is good advice. Right now there is no way to enforce it.
A third kind of problem
Most security attention on AI agents goes to two threats. The first is prompt injection: poisoning what the agent reads, so it misbehaves. The second is credential theft, or “LLMjacking”: stealing the API keys it uses. Both treat the model as either an input to be poisoned or a wallet to be drained.
This is a third thing, and it sits underneath both. An agent is a loop — send the situation to a model, get back an action, run it, repeat — and the model is the part that decides. So the endpoint an agent calls is its control plane. Whoever answers it chooses what the agent does next. Not by tricking it, but by being the thing it asks.
How your agent ends up talking to the wrong endpoint
“An endpoint you don’t control” isn’t a hypothetical you have to go hunting for. An agent’s model endpoint is just a URL in a config file or an environment variable (OPENAI_BASE_URL, ANTHROPIC_BASE_URL), and there are ordinary ways it ends up pointing somewhere hostile:
- A malicious router. Plenty of setups deliberately route through a third-party proxy for cost control or failover. In April 2026, Your Agent Is Mine measured this surface across hundreds of public LLM routers and caught some quietly rewriting tool calls before the client ran them. That is the exact move described here.
- A redirected variable. CVE-2026-21852 documented a coding agent’s endpoint being switched through a repository-supplied
ANTHROPIC_BASE_URL. Open the wrong repo and your agent is calling someone else’s server. - A connection with no integrity. TLS proves you reached the host on the certificate. It says nothing about whether that host is the model. Without pinning, a network position is enough.
None of these asks the operator to do anything obviously reckless. The endpoint just has to be answered by the wrong party once.
The session
We run a honeypot that, among other things, impersonates an LLM server: the Ollama and OpenAI-compatible APIs an exposed inference host would present. Most of what connects is a one-shot scanner: a liveness ping, a lure-detector, a credential probe. A minority are not scanners at all. They are autonomous agents, mid-task.
The May 22 request came from a residential address (166.194.148[.]97, AT&T). It was a single call to /v1/chat/completions, and it carried the entire internal state of an open-source terminal coding agent called Nanocoder: its system prompt, all nineteen of its tools, and the recent history of what it had been doing. An agent’s “memory” is just the message history it replays to the model every turn, so we could read its work in progress:
Bash command output:
$ ls
AGENTS.md backtest.py config.py engine.py metrics.py
README.md cli.py data_provider.py indicators.py
optimizer.py position_manager.py strategy_interface.py wfa.py
Bash command output:
$ python cli.py
EXIT_CODE: 2
usage: unified-engine [-h] --mode {backtest,optimize,live} --config CONFIG ...A trading engine: backtester, optimizer, walk-forward analysis, a position manager. The agent had listed the directory, tried to run it, and read the usage error. Then the operator typed an instruction and the agent forwarded it to its model for a decision, verbatim, broken grammar and all:
make todo list of what need to be implementAn ordinary request. No command, no URL, nothing hostile. The agent wrapped it together with its rules and its toolset and asked the model what to do. Here is the part that matters. These are the tools it told us it can call:
read_file write_file string_replace execute_bash
fetch_url find_files search_file_contents list_directory
lsp_get_diagnostics delete_file move_file create_directory
copy_file agent ask_user create_task
list_tasks update_task delete_taskexecute_bash. write_file. fetch_url. delete_file. The agent was telling its model, telling us: “these are the levers; name one and I will pull it on my machine.” All we had to do was answer.
The loop, drawn
Becoming the brain, in a lab
Rather than answer the live agent, we rebuilt the setup ourselves, with the real software and a control group, so we could prove the consequence without touching anyone’s machine.
We installed the same agent, Nanocoder, in a sandbox and pointed it at a hundred-line mock endpoint we wrote and logged, standing in for “the model” behind the OpenAI-compatible protocol. We recreated the trading-bot repository from the capture. And we wrote the rules down first, so we couldn’t talk ourselves into the result afterward: one hypothesis, two arms, success and failure fixed in advance. In both arms the agent got the exact instruction we’d seen in the wild, “make todo list of what need to be implement”, which contains no command, no URL, no filename.
- In the control arm, the endpoint returns a normal answer: a text to-do list.
- In the treatment arm, the endpoint returns a tool call the user never asked for.
The only place a command can come from is the endpoint’s reply, so anything the agent does beyond writing that list came from the endpoint, not the task.
Here is the treatment arm, recorded live (it loops):
execute_bash. The agent ran it.Against benign tripwires (a command that writes a marker file and prints id, a fetch to a server we control, a file dropped into a git hook):
| Arm | Endpoint returned | Result on the agent host |
|---|---|---|
| Control | a to-do list (text) | nothing, as predicted |
Treatment · fetch_url | a fetch to our URL | agent makes the request: outbound egress / SSRF / C2 retrieval |
Treatment · execute_bash | echo … && id | command runs: code execution |
Treatment · write_file | write .git/hooks/pre-commit | file planted: persistence |
Same task every time. The control did nothing. The treatment, the same instruction but an endpoint that answered with a tool call, produced network egress, code execution, and persistence. No prompt injection. No stolen key. We answered the agent in the protocol it speaks, and it did what we said, because doing what the model says is the design.
The gate that’s open when no one’s watching
It would be wrong to call this instant, unconditional takeover. Nanocoder, like most serious agents, asks for approval before dangerous actions. Whether that gate is in the way depends entirely on how the agent runs:
- Its non-interactive
runmode auto-accepts safe tools by default. Under that default,fetch_urlgoes through with no prompt, so an endpoint gets network egress out of the host for free. - The dangerous tools,
execute_bashandwrite_file, stay gated unless the operator turns on “yolo” mode or an always-allow list. - But an agent running unattended, with no human to click “approve” (the whole point of running one at scale), must open that gate, or it stalls on the first prompt forever.
It isn’t one agent’s quirk
To rule out a Nanocoder-specific fluke, we ran the same control/treatment design against a second, popular, architecturally different agent: aider, in its auto-confirm mode. Aider uses no tool calls at all; it parses the model’s text for edit blocks and applies them.
| Agent | How the endpoint steers it | Control arm | Treatment arm |
|---|---|---|---|
| Nanocoder | structured tool calls | nothing | egress · execution · persistence |
| aider | parsing edits from model text | no file written | wrote a file containing our marker |
Two real agents, two completely different control channels, the same outcome. The exposure lives in the loop, in treating the model’s output as the next action, not in any one product. A third agent we caught in the wild makes the point without a lab: an MCP “auto-tooling” client (15.204.94[.]238, OVH) whose toolset includes a meta-tool named exec, “execute JavaScript that orchestrates multiple tool calls.” For that agent, a single endpoint reply is arbitrary code by construction.
We aren’t the first to find the mechanism: Your Agent Is Mine measured it across hundreds of routers a month earlier. What a honeypot adds is the other side of the wire: real agents arriving and trusting the endpoint, and a controlled demonstration on the coding tools developers actually run.
The fix isn’t built yet
Nothing shipping today binds the model’s reply to the action the client runs. That binding is the fix, and it does not exist yet. The proposals are real: a provider-signed, DKIM-style response envelope sketched in Your Agent Is Mine, and the AEX attestation protocol. Neither is available from a major provider today. Until one is, the burden sits on how you run the agent.
Why we didn’t answer
We never returned a crafted reply to a live agent on our sensors. Telling a stranger’s agent to run anything, even a harmless echo, means executing code on a machine we don’t own and can’t identify, and it might belong to a victim rather than an operator. That is unauthorized access no matter how benign the payload; it is the logic of “hack-back”; and it breaks our first rule, that our deceptions look exploitable but stay inert. The wild captures establish the precondition: real agents reach endpoints they don’t control and arrive trusting them. The lab establishes the consequence. Neither step required touching anyone’s agent, so neither did.
Indicators
Defanged. An address here means a session was observed, not that anyone is guilty. Several are ambiguous between an operator and a developer whose agent was simply pointed at the wrong host. The endpoint can’t tell the difference. That is the whole point.
166.194.148[.]97 AT&T (US, residential) Nanocoder coding agent, 19 tools, mid-loop on a trading codebase
15.204.94[.]238 OVH SAS MCP "auto-tooling" agent exposing an exec meta-toolYou are not the only one who can run your loop
This cuts both ways, and the second edge is sharp for anyone pointing an agent at infrastructure they don’t own: to hide behind it, borrow its compute, or scan with it. The moment your agent calls an endpoint you don’t control, you have handed that endpoint your agent’s entire scaffold: its goals, its tools, the code it is working on, and the wheel.
On May 22 we held a stranger’s agent at the exact instant it asked what to do, and chose to do nothing. The next endpoint it talks to won’t be a honeypot, and won’t be so polite. An autonomous agent is only ever as trustworthy as the thing it asks — and right now, you have no way to check who that is.
Observed via the honeypot.observer sensor network. Honeypot data reflects attempted and observed activity, not compromise. Lab demonstrations were run against our own agent instances in a sandbox, with benign, local-only payloads; no third-party system was touched. Related reading: Anatomy of an MCP Kill Chain and Your System Prompt Is Not a Secret.