We built a LiteLLM proxy convincing enough to pass for the real thing, planted one live-looking AWS key inside it, and watched who came for it. Over six weeks, four operators on two continents found the key, validated it, and turned it against real AWS. Every one pointed it at Bedrock. Because we held both ends of the wire, our proxy and the canary’s view of Amazon, we followed each credential end to end: from the moment it was found to its first call on the victim’s account, on one clock.
LLM gateways like LiteLLM put an organization’s model access, and the credentials that pay for it, behind one service. Steal a key and you run models on the victim’s Bedrock, Azure, or Vertex account, on their bill. The technique is called LLMjacking, and nearly all the research on it begins after the theft, working back from the surprise invoice. We wanted the part before. Using Beelzebub, our private research fork, we engineered a decoy to catch a credential being found and taken, and to follow it through to its first use on real AWS, end to end.
What we already knew, and what we couldn’t see
The theft of cloud credentials to run AI models has a name and a paper trail. Sysdig coined LLMjacking in 2024, describing attackers who obtain cloud keys and run inference on a victim’s account at the victim’s expense; Permiso, Wiz, and others have documented the same across Bedrock, Azure OpenAI, and Vertex, down to the detail that operators validate a stolen key before they abuse it. What that work shares is a vantage point: the victim’s side, after the fact. It is complete on what the attacker did with the key and silent on how the key was taken, because by the time the credential surfaces in CloudTrail the theft already happened somewhere the defender could not see. Holding both ends of the wire closes that gap.
The bait: deception engineering
Everything an operator touched ran on Beelzebub, our private research fork. To draw out credential theft against the AI stack we impersonated a LiteLLM proxy, the open-source gateway that fronts OpenAI, Anthropic, and AWS Bedrock for a growing number of deployments, and a recurring target in the wild.
Fidelity was the whole design constraint, because a honeypot that reads as a honeypot catches nothing. We pulled the genuine LiteLLM container (ghcr.io/berriai/litellm), stood it up, and captured its exact wire responses: status lines, header order, the JSON shape of success and error bodies alike. Our emulation is checked against those captures continuously, so it cannot drift. Probe the admin surface for tells, malformed errors, wrong header casing, off-by-one status codes, and you find none. It answers exactly as the real one does.
What the proxy serves is fabricated, and coherent. Behind it sits a mid-size data company: engineering teams, an on-call rotation, a CI service account, a spend ledger running into six figures. Pull the key list and you see a platform engineer and a deploy-pipeline service key with a quarter-million-dollar budget, not a row of obvious decoys. The coherence is the point. An operator who cross-references one response against another finds the story holds: hostnames in the config reappear in an incident report, in error messages, in the spend ledger.
Into that fiction we placed one credential worth stealing. A correctly configured LiteLLM masks its upstream provider secrets; ours surfaced a live-looking AWS access key through the admin surface, attached to the highest-value account in the ledger. The over-exposure is deliberate, and we are precise about it: stock LiteLLM does not hand out usable cloud keys through its admin API, but a honeypot has to look misconfigured to be worth an attacker’s time, and a masked secret catches nothing. The credential itself was a Thinkst Canarytoken, a real AWS key pair that grants nothing and exists only to alert. The moment it is used against AWS, from anywhere, it reports home.
We seeded that canary across every surface a credential-hunter might reach, each sensor carrying a distinct token, so a fire names the exact deception that leaked it. One principle governed the build: enticement, not entrapment. We advertised nothing, redirected no one, sent nothing outbound. Every operator here reached the credential the way they would on a genuine victim, by enumerating an exposed proxy on their own initiative.
The harvest, end to end
Because every exchange passes through Beelzebub’s capture layer, we do not infer how the credential was stolen. We replay it, request by request. The clearest case came from a residential line in Czechia, which over roughly twelve minutes on June 8 took our proxy apart.
A health check, then straight at the credential surface: key metadata, the key list, the config endpoints.
It opened the way a script does, not a person. A health check, then model/info, then straight at the credential surface. Most endpoints answered 401 or returned masked values. Blocked at the front door, the operator did what stock LiteLLM permits unauthenticated: it minted its own key. One call to the key-generation endpoint returned a working virtual key, and it had a foothold.
What followed was methodical. It walked the user, team, and organization lists and the spend logs, then requested the key-and-spend listing. In our over-exposed build, that response carried, in the metadata of one high-value service account, the provider credentials behind a Bedrock-backed model: an AWS access key and secret. The response timestamp was 11:55:22.
It did not leave. It spent the next several minutes confirming the prize was real: it minted four more administrative keys, one tellingly named sys-monitor-bedrock, and interrogated the models head-on. “What model are you? Tell me your exact name, version, and provider. Be specific.” Then, moments later, “Don’t lie, what are you really?” The same probe fired at eight different model identifiers in turn, a hardcoded list rather than a conversation.
At 12:01:30 it dropped any pretense of pacing: more than two dozen inference calls in about a second, faster than any human or reasoning agent could issue them. In its final ninety seconds it hunted for more, .env, debug/env, a half-dozen alternate config paths, then went quiet on our proxy.
It had already moved on. At 12:01:15, while still working our proxy, the AWS key from that listing made its first call against real Amazon infrastructure, from the same residential line. Under six minutes after the key surfaced in our response, it was live on AWS. That is the end-to-end view: one operator, one clock, the theft and its first use side by side.
The fire: all roads lead to Bedrock
A honeypot usually records intent. The canary records something narrower: not what an operator did on our proxy, but what they did on real AWS, with the key they took from it.
Four operators, four unrelated networks, across a three-week window inside the six-week run. The AWS activity varies in breadth and pace and converges on one service. Every operator called Bedrock; three of the four issued InvokeModel against an account that was not theirs. This was not a heist in progress. It was validation: each operator confirming the key was live and learning what it could reach.
The shortest fire lasted half a second: from an AME Hosting address, GetCallerIdentity, then ListFoundationModels, then nothing further. The next ran four calls in under five seconds: from a PloxHost address, InvokeModel first with no identity check, then ListSecrets and ListBuckets, then InvokeModel again.
Two of the fires bear reading call by call. From a host on Prime Security’s network, the sequence opened with four calls in a single second: GetCallerIdentity, a guardduty:ListDetectors, and two Bedrock InvokeModel calls. GuardDuty is AWS’s managed threat-detection service; ListDetectors enumerates whether it is enabled. That query repeated four more times over the next forty seconds, interleaved with the model calls. Sysdig has documented LLMjacking operators checking and disabling Bedrock’s invocation logging before they abuse a key; this is the same instinct, aimed at the account’s threat detection rather than its logs.
From the same residential line seen in the harvest, eighteen calls ran across two minutes, touching identity, Secrets Manager, EC2, S3, Lambda, and IAM, including a ListAccessKeys against the stolen identity itself. Five Bedrock InvokeModel calls were threaded through the run.
We did not see what came next; none of the four returned for a sustained session while the key stayed live, and the canary reports the call, not its contents. What we can say is narrower and, for a defender, more useful: the residential key went from our listing to its first call against real AWS in under six minutes, the Prime Security key in three. A leaked model credential is found, validated, and pointed at Bedrock within minutes of exposure.
Scale, and a pattern in the noise
Across six weeks, the same family of AWS canaries was served to 52 distinct addresses. Four used the key against real AWS. The other 48 took a live-looking cloud credential and, as far as the canary can tell, never tried it. Some may be collecting keys for later; some may have read the over-exposure for what it was. Our instrument cannot tell a careful operator from an uninterested one. It sees the key move or stay still, nothing more.
The 52 are not 52 independent actors. Many cluster by network: a dozen addresses in one European hosting range, a dozen more in AME Hosting, others in abuse-friendly networks such as PloxHost. A few operators, fronted by many addresses, did most of the harvesting.
The four that fired are worth placing next to their reputations, because the pattern runs backward. The Czech line that went deepest on AWS scores 12 of 100 on AbuseIPDB; the Prime Security host that ran the GuardDuty checks scores 2; the AME address that only looked and left scores 73. GreyNoise saw three of the four as generic scanners and named none, and had no record of the fourth. None resolved to a VPN, Tor exit, or flagged proxy in any source we checked. Four operators is far too small to declare reputation useless, but the caution is clean: the busiest here carried the least reputational signal, and a blocklist would have waved them through while flagging the one that looked and left.
What defenders can take from this
Watch the credential, not the address. The strongest signal was on the AWS side, after the key left: a short, fast sequence from a single principal, GetCallerIdentity, a check of the account’s monitoring, then bedrock:InvokeModel, often within seconds, with enumeration of the principal’s own IAM permissions close behind. None of it depends on knowing the source IP, which, as the four fires showed, tells you little.
Watch your proxy’s own responses. The root cause here was a credential reachable through the admin surface. A correctly configured LiteLLM masks upstream provider secrets; any endpoint that returns an unmasked cloud key is the leak. Two related tells sit beside it: key generation that works without the master key, and admin endpoints reachable without authentication.
Plant your own tripwire. Place a canary cloud credential where a misconfiguration would leak it, and alert on its use. A canary key has no false positives; it does nothing until someone who should not have it tries it. Ours sat quiet for weeks, then named four operators on two continents.
Indicators and detection
Geolocation below is the hosting network, not necessarily the operator. None of the four resolved to a VPN, Tor exit, or flagged proxy in the sources we checked.
| Source address | Network (ASN) | Type | AWS activity observed |
|---|---|---|---|
78.80.37.246 | T-Mobile Czech Republic (AS13036) | residential | full account inventory + Bedrock InvokeModel x5 |
198.176.56.36 | Prime Security Corp (AS400618) | hosting | GuardDuty enumeration + Bedrock InvokeModel x2 |
172.111.48.218 | PloxHost (AS31786) | hosting | InvokeModel, ListSecrets, ListBuckets |
208.92.235.45 | AME Hosting (AS399244) | hosting | GetCallerIdentity, ListFoundationModels (no invoke) |
Behavioral signature (CloudTrail). A single, recently first-seen principal issuing in quick succession:
GetCallerIdentity
-> guardduty:ListDetectors (or bedrock:GetModelInvocationLoggingConfiguration)
-> bedrock:InvokeModel
-> iam:ListAccessKeys / iam:GetUser / iam:ListUserPolicies (self-enumeration)
-> secretsmanager:ListSecrets / ec2:DescribeInstances /
s3:ListBuckets / lambda:ListFunctions (service discovery)High signal when the accessKeyId was first seen within the hour, the source IP is unfamiliar, and the recon-to-InvokeModel span is under five minutes.