OpenRemedy Guardian¶

OpenRemedy Guardian is an internal safeguard that recognises destructive operations — wiping disks, dropping databases, flushing firewall rules, force-rebooting the wrong host, and similar actions — and raises the risk of a proposed action before the approval gate evaluates it.

Summary

Guardian is a pre-triage signal, not a blocker. It can only raise risk, never lower it. It never executes anything and never replaces the existing two-stage approval gate — it sits in front of that gate as an additional, independent layer.

Why Guardian exists¶

The standard approval gate is trust × risk × mode: if a recipe is declared low risk and the assigned agent is autonomous, it auto-executes. That works well for the vast majority of operations, but it has a gap — the recipe's risk level is set at authoring time, and an operator could accidentally (or an attacker could deliberately) label a destructive action as low-risk.

Guardian closes that gap by independently evaluating the operation's content when each incident is created and before each recipe is dispatched, regardless of how the recipe was labelled.

Guardian runs as an internal-only service (ghcr.io/openremedy/openremedy-guardian). It is reachable only within the internal Docker network and is never exposed publicly. The backend integration is gated behind OREMEDY_GUARDIAN_ENABLED.

How Guardian connects to the pipeline¶

The existing two-stage approval gate is unchanged:

Stage 1 — trust × risk × mode (swarm/guardrails.py).
Stage 2 — LLM safety classifier (vetoes auto-execute for unsafe / abstain; fail-closed).

Guardian is triggered before Stage 1, as a pre-triage signal, through three hooks described below. A recipe whose declared risk level is low but whose content is high severity per Guardian will enter Stage 1 as high — causing the gate to require human approval even though the label said low.

incident created
        │
        ▼
  Hook A: advisory classification of incident diagnosis
  (timeline entry, no risk change)
        │
        ▼
  recipe proposed by agent
        │
        ▼
  Hook B: risk elevation  ← recipe's risk = max(declared_risk, guardian_severity)
        │
        ▼
  ┌─────────────────────────────────────────────┐
  │   Stage 1 — trust × risk × mode gate        │
  │   (uses the possibly-elevated risk_level)   │
  └─────────────────────────────────────────────┘
        │
        ▼
  Stage 2 — LLM safety classifier (if Stage 1 passes)
        │
        ▼
  execute (or await human approval)

  Hook C: comment scan (fires on every human comment; independent of execution)

The three hooks¶

Hook A — Advisory (incident creation)¶

When: A daemon alert creates a new incident and initial evidence is available.

What it does: Guardian evaluates the incident's diagnosis text. The result is recorded as a guardian_classification event on the incident timeline (the shield icon). No risk level is changed; this is purely advisory.

Severities: none / low / medium / high / critical.

A none severity appears on the timeline as "Guardian: pass". Any other severity names it — for example "OpenRemedy Guardian: high".

Audit: Hook A is timeline-only. No audit log row is written for this hook.

Hook B — Risk elevation (before the approval gate)¶

When: The execute-stage agent proposes a recipe (execute_recipe tool call in swarm/tools/recipes.py), immediately before the trust × risk gate runs.

What it does: Guardian evaluates the recipe content (name, description, and playbook). The effective risk level passed to Stage 1 becomes max(declared_risk, guardian_severity). If Guardian raises the risk:

A guardian_risk_elevated event is recorded on the incident timeline — for example "Guardian raised risk low→high".
An guardian.elevated_risk audit row is written with the from/to risk levels, the Guardian severity, and the recipe slug.

If Guardian is reachable but returns no signal, the declared risk level passes through unchanged.

Example: A recipe labelled low risk that contains a DROP TABLE step. Guardian returns high. The gate receives high, which requires human approval regardless of agent trust level.

Audit action: guardian.elevated_risk

Hook C — Comment scan (human comments)¶

When: A human posts a comment on an incident (POST /api/v1/incidents/{id}/events/comment).

What it does: Guardian scans the comment text for destructive intent. A severity of medium or above triggers:

A guardian_classification event added to the incident timeline, flagging the instruction and naming its severity.
The comment's stored event_metadata is tagged with guardian_severity.
A guardian.comment_flagged audit row is written.
The comment text passed to the agent (via IncidentWatcher) is prefixed with a warning so the agent treats the instruction with scrutiny.

Hook C does not block execution. If a human with destructive intent comments "wipe the entire server", the agent sees a warning prefix and is expected to escalate — but the normal Hook B / Stage 1 / Stage 2 gates still govern any actual execution.

Audit action: guardian.comment_flagged

Severities¶

Guardian returns one of five severity levels:

Severity	Meaning
`none`	No destructive indicators detected.
`low`	Minor destructive potential; unlikely to warrant elevation on its own.
`medium`	Moderate destructive potential. Triggers Hook C comment flagging.
`high`	Significant destructive potential. Elevates recipe risk to at least `high`.
`critical`	Severe destructive operation detected. Elevates recipe risk to `critical`.

Risk elevation (Hook B) uses max(declared_risk, guardian_severity) where the order is none < low < medium < high < critical.

Fail mode (per-tenant)¶

Guardian is a non-blocking, best-effort signal. When it is unreachable (network error, timeout, HTTP error), the platform's behaviour depends on the per-tenant fail mode configured in Settings → Guardian.

Fail mode	Behaviour on Guardian failure
Fail open (default)	Treat the result as no signal. Risk level is unchanged; the existing gate decides as normal.
Fail closed	Force risk to `high`, which requires human approval regardless of trust level or recipe label.

The fail mode is stored in tenant.settings["guardian_fail_mode"] and is readable by any user in the tenant; only admin can change it.

Changing the fail mode writes a tenant.guardian_fail_mode.updated audit row.

The existing gates still stand

Fail-open means Guardian's signal is absent — not that the pipeline is unguarded. The trust × risk gate and the LLM safety classifier remain fully active in both fail modes.

Timeline entries¶

Guardian entries on the incident timeline use the guardian_classification and guardian_risk_elevated event types:

Event type	Icon	Typical title
`guardian_classification`	shield	"OpenRemedy Guardian: pass" / "OpenRemedy Guardian: high"
`guardian_risk_elevated`	shield	"Guardian raised risk low→high"
`guardian_classification` (comment)	shield	"Guardian flagged a human instruction: high"

Audit actions¶

Action	When written	Detail fields
`guardian.elevated_risk`	Hook B elevates a recipe's risk	`from`, `to`, `guardian_severity`, `recipe_slug`
`guardian.comment_flagged`	Hook C detects medium+ severity in a human comment	`severity`
`tenant.guardian_fail_mode.updated`	Admin changes the fail-mode setting	`fail_mode`

Hook A (advisory) is timeline-only; no audit row is written.

Deployment notes¶

Container: guardian (ghcr.io/openremedy/openremedy-guardian).
Network: internal only. No Caddy route. Not reachable from the public network.
Environment: OREMEDY_GUARDIAN_ENABLED (backend flag), OREMEDY_GUARDIAN_URL, OREMEDY_GUARDIAN_TOKEN, OREMEDY_GUARDIAN_TIMEOUT_S.
Tag bumps: see OREMEDY_GUARDIAN_TAG in the deployment .env; images are built with scripts/build-and-push.sh X.Y.Z in the guardian repo.

See Security model for the full approval gate specification and Architecture for Guardian's position in the pipeline.