Safety

The guardrails that constrain what can happen. Safety is the layer that wraps around your entire agentic system and makes sure certain things are simply not possible — regardless of what the model wants to do.

What This Flow Does

Safety prevents bad outcomes. It wraps around the entire system — especially around MCP actions — and enforces hard boundaries that no model output or agent decision can override.

Input Validation

Every input gets checked before it reaches the model. Malformed data, excessively long prompts, disallowed content types — all caught at the gate. If something shouldn't go in, it doesn't go in.

Output Filtering

Model outputs pass through filters before they reach users or downstream systems. This catches hallucinated actions, sensitive data leakage, and responses that violate your content policies.

Scope Constraints

Agents can only access the tools, data sources, and actions they're explicitly permitted to use. A research agent can't send emails. A summarisation agent can't modify databases. The boundaries are structural, not just instructional.

Prompt Injection Protection

External inputs — user messages, retrieved documents, API responses — get sanitised to prevent prompt injection attacks. This is especially critical when agents process untrusted content from the web or third-party systems.

Human-in-the-Loop Approval Gates

High-stakes actions require human approval before execution. Sending a client email, modifying production data, committing financial transactions — these pause for review rather than running autonomously.

MCP Action Wrapping

MCP tool calls are where your AI system touches the real world. Safety wraps around every MCP action with rate limits, permission checks, and rollback capabilities. This is where guardrails matter most.

How It Differs from Observability

These two flows are easy to confuse because they both deal with "making sure things go right." But they work in fundamentally different ways, and you need both.

Observability (Flow #4)

Watches and reports. It tells you what happened after the fact. Observability is forensic — it logs decisions, measures quality, tracks costs. When something goes wrong, observability helps you understand why and when.

Observability answers: "What did the system do?"

Safety (Flow #5)

Constrains and prevents. It stops bad things from happening in the first place. Safety is proactive — it blocks disallowed actions, validates inputs, enforces scope limits. When something would go wrong, safety makes sure it can't.

Safety answers: "What is the system not allowed to do?"

Think of it this way: observability is the security camera. Safety is the lock on the door. The camera records everything, but it doesn't stop anyone. The lock prevents entry, but it doesn't tell you who tried. You need both — and they reinforce each other. Observability logs every time a safety boundary is triggered, which helps you refine those boundaries over time.

What This Flow Doesn't Solve

Safety keeps your system within bounds. It makes sure nothing catastrophic happens, nothing leaks that shouldn't, and no action fires without proper authorisation. That's essential — but it's not the whole picture.

Here's what safety can't do: it can't capture the value of what your system actually produced. When your agent generates a useful analysis, drafts a solid email, or surfaces a critical insight — safety doesn't preserve any of that. The outputs, the conversations, the decisions, the artifacts — they all need somewhere to go.

Without storage, your system is a sieve. It does good work, safety makes sure it doesn't do bad work, but nothing accumulates. Every run starts from scratch. Every insight gets generated once and discarded. Every conversation disappears when the session ends.

That's why the next flow exists. Flow #6 (Storage) is where outputs, logs, conversations, and decisions get captured and preserved — so the system can build on what it's already done instead of starting over every time.

Go Deeper

This page covers safety as a flow — how guardrails wrap around your system at runtime. For the technical building block — tooling, implementation patterns, and what I help teams set up — see the dedicated page.

Safety Building Block

Input/output guardrails, permission models, rate limiting, prompt injection defences, and human-in-the-loop patterns for production AI systems.

Explore Safety Building Block →

Need guardrails that actually hold?

I help teams design and implement safety layers that constrain AI systems without crippling them — practical guardrails that protect without slowing everything down.

Start a Conversation See All Flows