Methodology

From sensor to publication.

Confidence ratings, the attribution ladder, and what gets cut before a post ships. The standards behind every observation that lands here.

01 · Collection

What we collect

A small fleet,
three continents.

The platform operates a small fleet of honeypot sensors across three continents on a fork of Beelzebub by Beelzebub.AI. Each sensor impersonates services targeted in current adversary campaigns: SSH with LLM-backed shell emulation, HTTP lures for AI-infrastructure endpoints (Ollama, OpenAI-compatible APIs, Model Context Protocol gateways), and a small set of legacy services that supply baseline coverage.

Collection is passive throughout. Sensors do not initiate scans, do not deliver exploits, and do not perform reverse DNS lookups against operator infrastructure. The sensors wait, and what reaches them is what gets logged.

02 · Confidence

Confidence vocabulary

Three grades.

Every published judgment carries an explicit rating where the evidence supports tagging. Where it does not, hedged language carries the same grade.

Highconfidence

Direct observation across multiple sessions or sensors, corroborated by external evidence.

Mediumconfidence

Plausible inference from observed behavior, with the gaps in the evidence acknowledged.

Lowconfidence

A leading hypothesis only, with the evidence insufficient to generalize.

03 · Attribution

Attribution ladder

Behavior,
not attribution.

Findings report behavioral clusters, infrastructure footprints, and tooling fingerprints. We do not publicly attribute observations to named threat actors, nation-states, or individuals without direct primary evidence.

Anything above behavioral cluster needs evidence we typically do not have. We say so on the page when that is the case.

04 · Publication

The ships filter

If it does not translate,
it does not ship.

Each post is written for one of three downstream readers: a detection engineer who needs a rule, a threat hunter who needs a hypothesis, or an incident responder who needs an indicator with context. If a finding cannot be translated into one of those, it does not ship.

Where the evidence allows, posts include a draft detection rule, a hunt query, or an explicit coverage gap. Where context can be added (hosting provider, geography, regulatory environment, industry vertical of the observed target), we add it. Cyber signals do not arrive context-free.

05 · Stewardship

Handling sensitive material

Redact first.
Notify privately.

When a session names a third-party domain, organization, or individual, the name is redacted before publication. A short factual note is sent privately to the named organization's security contact where one is reachable, with the indicators and tool identification we have. We publish to share observation; we do not publish to embarrass a target.

Material that would expose a target's infrastructure beyond what is already public is held back. We have made the call to omit material more than once.

06 · Sources

Source code & reproducibility

The framework
is open source.

The sensor framework, Beelzebub by Beelzebub.AI, is open source. Our fork extends it for AI-targeted telemetry.

Raw session data and hold-back artifacts (full transcripts, observed credentials, fault-injection traces) are available to verified peers on request.

See the methodology applied.

Read the field reports

Confidence-graded · Attribution-disciplined

Methodology

From sensor to publication.

Confidence ratings, the attribution ladder, and what gets cut before a post ships. The standards behind every observation that lands here.

01 · Collection

What we collect

A small fleet,
three continents.

02 · Confidence

Confidence vocabulary

Three grades.

Every published judgment carries an explicit rating where the evidence supports tagging. Where it does not, hedged language carries the same grade.

Highconfidence

Direct observation across multiple sessions or sensors, corroborated by external evidence.

Mediumconfidence

Plausible inference from observed behavior, with the gaps in the evidence acknowledged.

Lowconfidence

A leading hypothesis only, with the evidence insufficient to generalize.

03 · Attribution

Attribution ladder

Behavior,
not attribution.

Anything above behavioral cluster needs evidence we typically do not have. We say so on the page when that is the case.

04 · Publication

The ships filter

If it does not translate,
it does not ship.

05 · Stewardship

Handling sensitive material

Redact first.
Notify privately.

Material that would expose a target's infrastructure beyond what is already public is held back. We have made the call to omit material more than once.

06 · Sources

Source code & reproducibility

The framework
is open source.

The sensor framework, Beelzebub by Beelzebub.AI, is open source. Our fork extends it for AI-targeted telemetry.

Raw session data and hold-back artifacts (full transcripts, observed credentials, fault-injection traces) are available to verified peers on request.

See the methodology applied.

Read the field reports

Confidence-graded · Attribution-disciplined

From sensor to publication.

A small fleet, three continents.

Three grades.

Behavior, not attribution.

If it does not translate, it does not ship.

Redact first. Notify privately.

The framework is open source.

See the methodology applied.

From sensor to publication.

A small fleet, three continents.

Three grades.

Behavior, not attribution.

If it does not translate, it does not ship.

Redact first. Notify privately.

The framework is open source.

See the methodology applied.

A small fleet,
three continents.

Behavior,
not attribution.

If it does not translate,
it does not ship.

Redact first.
Notify privately.

The framework
is open source.

A small fleet,
three continents.

Behavior,
not attribution.

If it does not translate,
it does not ship.

Redact first.
Notify privately.

The framework
is open source.