Linear Types for Agent Safety - An Approach to Trustworthy AI Systems

The fundamental problem with AI agents today is that we can’t prove they’re safe—we can only test them and hope. An agent might work perfectly in development, pass all your tests, and then leak credentials in production because it reused an API key in an unexpected way. Or it might exhaust your cloud budget because nothing prevented it from spawning unlimited API calls.

What if we could make these failures impossible—not through runtime monitoring that can be bypassed, but through a type system that rejects unsafe code before it ever runs?

The Core Idea: Resources as Physical Objects

Linear logic, a branch of mathematical logic from the 1980s, has a simple but powerful principle: every value must be used exactly once. You can’t copy it, you can’t ignore it, you can’t use it twice. Think of it like physical objects in the real world—if you hand someone your car keys, you no longer have them.

Rust programmers know a relaxed version of this through the borrow checker. But applying full linear types to AI agents unlocks something more profound: security properties you can prove mathematically, not just test empirically.

Here’s the key insight: if an API token is a linear type, an agent literally cannot use it twice. Not “shouldn’t,” not “will be blocked at runtime”—the code simply won’t compile or execute. The agent must consume the token, get a response, and that’s it. Want to make another API call? You need another token.

Why This Matters for Agent Security

Traditional security relies on sandboxes: runtime checks that ask “does this agent have permission?” But sandboxes have escape hatches. They can be misconfigured. They add overhead. And they’re checked at runtime, after you’ve already deployed potentially dangerous code.

Linear types flip this around. Permissions become objects the agent must physically possess. No token, no file access. No budget, no API calls. The type system enforces this before the agent runs.

Consider a concrete example. You want an agent to analyze log files and generate a report. Traditionally, you’d give it filesystem access and hope it only reads what it should. With linear types:

analyze_logs : FileRead(./logs) ⊗ Budget(1000) → Report

This signature says: “This agent consumes read access to the logs directory and 1000 compute units. It produces a report.” The ⊗ symbol means these resources are used linearly—consumed exactly once.

The agent cannot write files (no FileWrite token), cannot access other directories (the token is scoped to ./logs), and cannot use more than 1000 compute units (the budget is exhausted after that). These aren’t runtime checks—they’re enforced by the type system. If the agent’s code tries to write a file, that code won’t type-check.

Four Tiers of Trust

The challenge is that LLMs generate messy, unpredictable code. Requiring perfect linear types everywhere would be impractical. The solution is a graduated trust model with four tiers:

Tier 1: The Kernel (Strict Linear)

This is the security-critical core that actually manages real resources—file handles, network sockets, API keys. It’s written by humans, formally verified, and uses strict linear types. This code is small, audited, and never changes unless absolutely necessary.

Think of it like an operating system kernel: small, trusted, and responsible for mediating all access to dangerous resources.

Tier 2: Trusted Agents (Linear with Escape Hatches)

These are agents for semi-trusted tasks, possibly written by senior developers or generated by LLMs with heavy human review. They use linear types but can explicitly duplicate resources when needed:

config = read_config()  // linear
config_copy1, config_copy2 = duplicate(config)  // explicit, logged

Every duplication is audited. The type system knows it happened. If something goes wrong, you can trace exactly where resources were copied.

Tier 3: Untrusted Agents (Strict Linear, Limited API)

This is where LLM-generated code lives by default. These agents get strict linear types and a restricted API—no escape hatches, no unsafe operations. They can only use pre-approved functions that maintain linearity.

If the generated code doesn’t type-check, the framework either rejects it or asks the LLM to fix the type errors. The agent literally cannot perform operations it doesn’t have tokens for.

Tier 4: Fully Sandboxed (Runtime Enforcement)

For agents that can’t be statically type-checked—maybe they’re in a dynamic language, or they’re too complex—fall back to runtime enforcement. The framework tracks token usage at runtime and immediately shuts down any agent that tries to reuse a consumed token.

This is your safety net. It’s not as strong as static guarantees, but it’s still far better than traditional sandboxes because it’s enforcing linear semantics, not just permission checks.

What This Enables

Provable Resource Limits

Deploy an agent with Budget(5000) and you know it cannot exceed 5000 compute units. Not “probably won’t,” not “unless there’s a bug”—mathematically cannot. The type system prevents code that would exceed the budget from existing.

This is crucial for cost control, but also for preventing denial-of-service attacks where a compromised agent tries to exhaust resources.

Unforgeable Credentials

API keys and secrets become linear types. An agent that processes a secret gets Secret(String), which can only be used with functions that preserve the secret type:

hash_password : Secret(String) → Secret(Hash)
store_encrypted : Secret(T) → EncryptedData

The agent cannot log the secret, send it over the network, or write it to disk unencrypted. Those operations don’t accept Secret(T)—they only work with public data. The only way to extract the raw value is through explicit declassification functions that require admin privileges and are fully audited.

Complete Audit Trails

Every operation with a linear resource produces a proof-of-use that must be logged:

write_file : FileToken → Data → (Result, Proof)

The Proof is itself linear—it can’t be discarded or ignored. The agent must pass it to the audit logging system. This means every action is automatically traced. No way to perform an operation without leaving a record.

Safe Composition

When you combine two agents, their capabilities don’t automatically merge. Agent A can’t use Agent B’s tokens. To share access, you must explicitly split tokens:

split_budget : Budget(n) → (Budget(n/2), Budget(n/2))

This makes privilege escalation attacks much harder. A compromised agent can only use the resources it was explicitly given.

Practical Implementation

You don’t need to rewrite everything in a exotic functional language. A practical system could:

Use a typed DSL for agent tasks. LLMs generate code in this restricted language, which has linear semantics built in. The DSL compiles to Python/JavaScript/whatever, but the type checker runs first.
Wrap existing APIs with linear interfaces. Your file operations, database calls, and HTTP requests get wrapped in functions that consume tokens. The underlying implementation is normal code, but agents interact through the linear API.
Provide runtime enforcement as a fallback. If static type checking fails, the framework tracks token usage at runtime and enforces linearity dynamically. It’s not as strong, but it still catches violations.
Make tokens unforgeable using cryptographic capabilities or unique runtime identifiers that can’t be guessed or constructed by agent code.

The Path Forward

We’re at an inflection point with AI agents. They’re becoming powerful enough to be genuinely useful—and genuinely dangerous. The traditional approach of “deploy and monitor” doesn’t scale when agents can operate autonomously at machine speed.

Linear types offer a path to provable safety: security properties you can verify before deployment, not discover through incidents. An agent with the right type signature is safe by construction. No amount of prompt injection or adversarial input can make it violate its resource constraints.

This isn’t science fiction. The theory is decades old. Rust has proven that linear-ish types can work in production systems. The innovation is applying these ideas to AI agents—turning them from unpredictable black boxes into mathematical objects we can reason about.

The agents of the future won’t just be smarter. They’ll be provably trustworthy.