OpenRemedy Guardian¶
OpenRemedy Guardian is an internal safeguard that recognises destructive operations — wiping disks, dropping databases, flushing firewall rules, force-rebooting the wrong host, and similar actions — and raises the risk of a proposed action before the approval gate evaluates it.
Summary
Guardian is a pre-triage signal, not a blocker. It can only raise risk, never lower it. It never executes anything and never replaces the existing two-stage approval gate — it sits in front of that gate as an additional, independent layer.
Why Guardian exists¶
The standard approval gate is trust × risk × mode: if a recipe is declared
low risk and the assigned agent is autonomous, it auto-executes. That
works well for the vast majority of operations, but it has a gap — the recipe's
risk level is set at authoring time, and an operator could accidentally
(or an attacker could deliberately) label a destructive action as low-risk.
Guardian closes that gap by independently evaluating the operation's content when each incident is created and before each recipe is dispatched, regardless of how the recipe was labelled.
Guardian runs as an internal-only service (ghcr.io/openremedy/openremedy-guardian).
It is reachable only within the internal Docker network and is never exposed
publicly. The backend integration is gated behind OREMEDY_GUARDIAN_ENABLED.
How Guardian connects to the pipeline¶
The existing two-stage approval gate is unchanged:
- Stage 1 — trust × risk × mode (
swarm/guardrails.py). - Stage 2 — LLM safety classifier (vetoes auto-execute for
unsafe/abstain; fail-closed).
Guardian is triggered before Stage 1, as a pre-triage signal, through
three hooks described below. A recipe whose declared risk level is low but
whose content is high severity per Guardian will enter Stage 1 as high —
causing the gate to require human approval even though the label said low.
incident created
│
▼
Hook A: advisory classification of incident diagnosis
(timeline entry, no risk change)
│
▼
recipe proposed by agent
│
▼
Hook B: risk elevation ← recipe's risk = max(declared_risk, guardian_severity)
│
▼
┌─────────────────────────────────────────────┐
│ Stage 1 — trust × risk × mode gate │
│ (uses the possibly-elevated risk_level) │
└─────────────────────────────────────────────┘
│
▼
Stage 2 — LLM safety classifier (if Stage 1 passes)
│
▼
execute (or await human approval)
Hook C: comment scan (fires on every human comment; independent of execution)
The three hooks¶
Hook A — Advisory (incident creation)¶
When: A daemon alert creates a new incident and initial evidence is available.
What it does: Guardian evaluates the incident's diagnosis text. The result
is recorded as a guardian_classification event on the incident timeline
(the shield icon). No risk level is changed; this is purely advisory.
Severities: none / low / medium / high / critical.
A none severity appears on the timeline as "Guardian: pass". Any other
severity names it — for example "OpenRemedy Guardian: high".
Audit: Hook A is timeline-only. No audit log row is written for this hook.
Hook B — Risk elevation (before the approval gate)¶
When: The execute-stage agent proposes a recipe (execute_recipe tool
call in swarm/tools/recipes.py), immediately before the trust × risk gate
runs.
What it does: Guardian evaluates the recipe content (name, description,
and playbook). The effective risk level passed to Stage 1 becomes
max(declared_risk, guardian_severity). If Guardian raises the risk:
- A
guardian_risk_elevatedevent is recorded on the incident timeline — for example "Guardian raised risk low→high". - An
guardian.elevated_riskaudit row is written with the from/to risk levels, the Guardian severity, and the recipe slug.
If Guardian is reachable but returns no signal, the declared risk level passes through unchanged.
Example: A recipe labelled low risk that contains a DROP TABLE step.
Guardian returns high. The gate receives high, which requires human
approval regardless of agent trust level.
Audit action: guardian.elevated_risk
Hook C — Comment scan (human comments)¶
When: A human posts a comment on an incident (POST /api/v1/incidents/{id}/events/comment).
What it does: Guardian scans the comment text for destructive intent. A
severity of medium or above triggers:
- A
guardian_classificationevent added to the incident timeline, flagging the instruction and naming its severity. - The comment's stored
event_metadatais tagged withguardian_severity. - A
guardian.comment_flaggedaudit row is written. - The comment text passed to the agent (via
IncidentWatcher) is prefixed with a warning so the agent treats the instruction with scrutiny.
Hook C does not block execution. If a human with destructive intent comments "wipe the entire server", the agent sees a warning prefix and is expected to escalate — but the normal Hook B / Stage 1 / Stage 2 gates still govern any actual execution.
Audit action: guardian.comment_flagged
Severities¶
Guardian returns one of five severity levels:
| Severity | Meaning |
|---|---|
none |
No destructive indicators detected. |
low |
Minor destructive potential; unlikely to warrant elevation on its own. |
medium |
Moderate destructive potential. Triggers Hook C comment flagging. |
high |
Significant destructive potential. Elevates recipe risk to at least high. |
critical |
Severe destructive operation detected. Elevates recipe risk to critical. |
Risk elevation (Hook B) uses max(declared_risk, guardian_severity) where the
order is none < low < medium < high < critical.
Fail mode (per-tenant)¶
Guardian is a non-blocking, best-effort signal. When it is unreachable (network error, timeout, HTTP error), the platform's behaviour depends on the per-tenant fail mode configured in Settings → Guardian.
| Fail mode | Behaviour on Guardian failure |
|---|---|
| Fail open (default) | Treat the result as no signal. Risk level is unchanged; the existing gate decides as normal. |
| Fail closed | Force risk to high, which requires human approval regardless of trust level or recipe label. |
The fail mode is stored in tenant.settings["guardian_fail_mode"] and is
readable by any user in the tenant; only admin can change it.
Changing the fail mode writes a tenant.guardian_fail_mode.updated audit
row.
The existing gates still stand
Fail-open means Guardian's signal is absent — not that the pipeline is unguarded. The trust × risk gate and the LLM safety classifier remain fully active in both fail modes.
Timeline entries¶
Guardian entries on the incident timeline use the guardian_classification
and guardian_risk_elevated event types:
| Event type | Icon | Typical title |
|---|---|---|
guardian_classification |
shield | "OpenRemedy Guardian: pass" / "OpenRemedy Guardian: high" |
guardian_risk_elevated |
shield | "Guardian raised risk low→high" |
guardian_classification (comment) |
shield | "Guardian flagged a human instruction: high" |
Audit actions¶
| Action | When written | Detail fields |
|---|---|---|
guardian.elevated_risk |
Hook B elevates a recipe's risk | from, to, guardian_severity, recipe_slug |
guardian.comment_flagged |
Hook C detects medium+ severity in a human comment | severity |
tenant.guardian_fail_mode.updated |
Admin changes the fail-mode setting | fail_mode |
Hook A (advisory) is timeline-only; no audit row is written.
Deployment notes¶
- Container:
guardian(ghcr.io/openremedy/openremedy-guardian). - Network: internal only. No Caddy route. Not reachable from the public network.
- Environment:
OREMEDY_GUARDIAN_ENABLED(backend flag),OREMEDY_GUARDIAN_URL,OREMEDY_GUARDIAN_TOKEN,OREMEDY_GUARDIAN_TIMEOUT_S. - Tag bumps: see
OREMEDY_GUARDIAN_TAGin the deployment.env; images are built withscripts/build-and-push.sh X.Y.Zin the guardian repo.
See Security model for the full approval gate specification and Architecture for Guardian's position in the pipeline.