Skip to content

OpenRemedy Guardian

OpenRemedy Guardian is an internal safeguard that recognises destructive operations — wiping disks, dropping databases, flushing firewall rules, force-rebooting the wrong host, and similar actions — and raises the risk of a proposed action before the approval gate evaluates it.

Summary

Guardian is a pre-triage signal, not a blocker. It can only raise risk, never lower it. It never executes anything and never replaces the existing two-stage approval gate — it sits in front of that gate as an additional, independent layer.


Why Guardian exists

The standard approval gate is trust × risk × mode: if a recipe is declared low risk and the assigned agent is autonomous, it auto-executes. That works well for the vast majority of operations, but it has a gap — the recipe's risk level is set at authoring time, and an operator could accidentally (or an attacker could deliberately) label a destructive action as low-risk.

Guardian closes that gap by independently evaluating the operation's content when each incident is created and before each recipe is dispatched, regardless of how the recipe was labelled.

Guardian runs as an internal-only service (ghcr.io/openremedy/openremedy-guardian). It is reachable only within the internal Docker network and is never exposed publicly. The backend integration is gated behind OREMEDY_GUARDIAN_ENABLED.


How Guardian connects to the pipeline

The existing two-stage approval gate is unchanged:

  1. Stage 1 — trust × risk × mode (swarm/guardrails.py).
  2. Stage 2 — LLM safety classifier (vetoes auto-execute for unsafe / abstain; fail-closed).

Guardian is triggered before Stage 1, as a pre-triage signal, through three hooks described below. A recipe whose declared risk level is low but whose content is high severity per Guardian will enter Stage 1 as high — causing the gate to require human approval even though the label said low.

incident created
  Hook A: advisory classification of incident diagnosis
  (timeline entry, no risk change)
  recipe proposed by agent
  Hook B: risk elevation  ← recipe's risk = max(declared_risk, guardian_severity)
  ┌─────────────────────────────────────────────┐
  │   Stage 1 — trust × risk × mode gate        │
  │   (uses the possibly-elevated risk_level)   │
  └─────────────────────────────────────────────┘
  Stage 2 — LLM safety classifier (if Stage 1 passes)
  execute (or await human approval)

  Hook C: comment scan (fires on every human comment; independent of execution)

The three hooks

Hook A — Advisory (incident creation)

When: A daemon alert creates a new incident and initial evidence is available.

What it does: Guardian evaluates the incident's diagnosis text. The result is recorded as a guardian_classification event on the incident timeline (the shield icon). No risk level is changed; this is purely advisory.

Severities: none / low / medium / high / critical.

A none severity appears on the timeline as "Guardian: pass". Any other severity names it — for example "OpenRemedy Guardian: high".

Audit: Hook A is timeline-only. No audit log row is written for this hook.


Hook B — Risk elevation (before the approval gate)

When: The execute-stage agent proposes a recipe (execute_recipe tool call in swarm/tools/recipes.py), immediately before the trust × risk gate runs.

What it does: Guardian evaluates the recipe content (name, description, and playbook). The effective risk level passed to Stage 1 becomes max(declared_risk, guardian_severity). If Guardian raises the risk:

  • A guardian_risk_elevated event is recorded on the incident timeline — for example "Guardian raised risk low→high".
  • An guardian.elevated_risk audit row is written with the from/to risk levels, the Guardian severity, and the recipe slug.

If Guardian is reachable but returns no signal, the declared risk level passes through unchanged.

Example: A recipe labelled low risk that contains a DROP TABLE step. Guardian returns high. The gate receives high, which requires human approval regardless of agent trust level.

Audit action: guardian.elevated_risk


Hook C — Comment scan (human comments)

When: A human posts a comment on an incident (POST /api/v1/incidents/{id}/events/comment).

What it does: Guardian scans the comment text for destructive intent. A severity of medium or above triggers:

  • A guardian_classification event added to the incident timeline, flagging the instruction and naming its severity.
  • The comment's stored event_metadata is tagged with guardian_severity.
  • A guardian.comment_flagged audit row is written.
  • The comment text passed to the agent (via IncidentWatcher) is prefixed with a warning so the agent treats the instruction with scrutiny.

Hook C does not block execution. If a human with destructive intent comments "wipe the entire server", the agent sees a warning prefix and is expected to escalate — but the normal Hook B / Stage 1 / Stage 2 gates still govern any actual execution.

Audit action: guardian.comment_flagged


Severities

Guardian returns one of five severity levels:

Severity Meaning
none No destructive indicators detected.
low Minor destructive potential; unlikely to warrant elevation on its own.
medium Moderate destructive potential. Triggers Hook C comment flagging.
high Significant destructive potential. Elevates recipe risk to at least high.
critical Severe destructive operation detected. Elevates recipe risk to critical.

Risk elevation (Hook B) uses max(declared_risk, guardian_severity) where the order is none < low < medium < high < critical.


Fail mode (per-tenant)

Guardian is a non-blocking, best-effort signal. When it is unreachable (network error, timeout, HTTP error), the platform's behaviour depends on the per-tenant fail mode configured in Settings → Guardian.

Fail mode Behaviour on Guardian failure
Fail open (default) Treat the result as no signal. Risk level is unchanged; the existing gate decides as normal.
Fail closed Force risk to high, which requires human approval regardless of trust level or recipe label.

The fail mode is stored in tenant.settings["guardian_fail_mode"] and is readable by any user in the tenant; only admin can change it.

Changing the fail mode writes a tenant.guardian_fail_mode.updated audit row.

The existing gates still stand

Fail-open means Guardian's signal is absent — not that the pipeline is unguarded. The trust × risk gate and the LLM safety classifier remain fully active in both fail modes.


Timeline entries

Guardian entries on the incident timeline use the guardian_classification and guardian_risk_elevated event types:

Event type Icon Typical title
guardian_classification shield "OpenRemedy Guardian: pass" / "OpenRemedy Guardian: high"
guardian_risk_elevated shield "Guardian raised risk low→high"
guardian_classification (comment) shield "Guardian flagged a human instruction: high"

Audit actions

Action When written Detail fields
guardian.elevated_risk Hook B elevates a recipe's risk from, to, guardian_severity, recipe_slug
guardian.comment_flagged Hook C detects medium+ severity in a human comment severity
tenant.guardian_fail_mode.updated Admin changes the fail-mode setting fail_mode

Hook A (advisory) is timeline-only; no audit row is written.


Deployment notes

  • Container: guardian (ghcr.io/openremedy/openremedy-guardian).
  • Network: internal only. No Caddy route. Not reachable from the public network.
  • Environment: OREMEDY_GUARDIAN_ENABLED (backend flag), OREMEDY_GUARDIAN_URL, OREMEDY_GUARDIAN_TOKEN, OREMEDY_GUARDIAN_TIMEOUT_S.
  • Tag bumps: see OREMEDY_GUARDIAN_TAG in the deployment .env; images are built with scripts/build-and-push.sh X.Y.Z in the guardian repo.

See Security model for the full approval gate specification and Architecture for Guardian's position in the pipeline.