Skip to content
Cogitate
Go back

Building Smarter AI Agents With Ideas From Philosophy

| Björn Roberg, GPT-5.1 Edit page

When we talk about “AI agents,” we usually jump straight to tools, APIs, and model prompts. But under the hood, any useful agent is constantly doing something deeply philosophical: forming beliefs, weighing evidence, making decisions, and following (or breaking) rules.

This post is about how to steal concepts from epistemology (the philosophy of knowledge) and ethics to design better agents in practice. The aim is to keep things concrete and implementable, not just hand‑wavy.


1. Make the Agent’s “Beliefs” Explicit

Instead of treating your agent as a black box that just spits out text, treat it as something that has beliefs about the world.

Store beliefs, not just chat logs

Don’t just keep a raw history of messages. Maintain a structured “world model” that captures what the agent thinks is true and how sure it is.

You can think of a simple BeliefStore with entries like:

Keep track of “because…”

For important beliefs, store why the agent believes them:

That way you can later inspect whether the agent “knows” something in a robust way or is just repeating a guess with too much confidence.

Embrace uncertainty

Instead of pretending the agent is always certain, make uncertainty first‑class:

This is essentially taking ideas from formal epistemology (Bayesian/evidential reasoning) and applying them to your knowledge store.


2. Turn Epistemic Norms Into Code Rules

Philosophy has a lot to say about what makes for good beliefs. You can turn those norms into concrete coding constraints.

2.1 Epistemic humility (avoid overclaiming)

2.2 Principle of charity (interpret users generously)

When the user says something ambiguous, default to the most reasonable interpretation instead of the weirdest.

Implementation ideas:

2.3 Don’t call it “knowledge” unless it’s robust

Reserve “the agent knows X” for beliefs that:

You can model this as a promotion: beliefs get “upgraded” to known only after cross‑validation.


3. Make Decisions With Moral Structure, Not Just Heuristics

When an agent chooses actions, it’s implicitly taking a moral stance. You can make that structure explicit using ideas from moral philosophy and decision theory.

3.1 Consequences: how good or bad is each action?

Build a consequentialist layer that:

3.2 Rules: hard lines the agent must not cross

Add a rule-based (deontic) layer:

3.3 Traits: build in good “habits”

Use virtue-inspired heuristics to give the agent character:

This is like mixing utilitarian (outcomes), deontological (rules), and virtue‑ethics (character) checks into your decision loop.


4. Reason Explicitly About Knowledge in Planning

Modal logic and planning theory sound fancy, but you can use simple versions in code.

4.1 Separate “believes” from “knows”

Internally, distinguish:

Then define policies like:

4.2 Track what the user knows

For user‑facing or multi‑agent systems, track:

Use this to:

4.3 Treat plan checks like mini proofs

Before executing a plan, treat validation like a lightweight proof:

If any of these checks fail, revise or discard the plan.


5. Take Sources and Testimony Seriously

Whenever your agent calls tools or APIs, it’s essentially trusting testimony. Epistemology of testimony has direct design implications.

5.1 Model source reliability

For each tool or data source, store:

When sources conflict:

5.2 Handle “defeaters” (when evidence undercuts a source)

If new evidence suggests a source is unreliable (bug, outage, bad past answers):

5.3 Reflect testimonial chains

If a tool aggregates other tools (e.g., web search across many sites), let that show up in the justification:

This helps both debugging and user‑facing explanations.


6. Define Your Core Concepts Clearly

Borrow from conceptual engineering: nail down what your key labels actually mean, operationally.

6.1 Clarify “task”, “goal”, “success”

Give working definitions like:

6.2 Make “safety” and “harm” testable

Instead of vague labels:

6.3 Make concepts revisable

As you refine your understanding (e.g., of what counts as “high‑stakes”):


7. Debug Failures as Epistemic Bugs

Don’t just patch failures with ad‑hoc rules. Treat them as evidence about the agent’s “theory of mind.”

7.1 Classify failures philosophically

Examples:

7.2 Add a guardrail per failure type

For each failure class, add a targeted fix:

7.3 Maintain an “epistemic changelog”

When you change the system, describe it in knowledge/ethics terms, not just code terms. For example:

This makes it easier to see what kind of agent you’re evolving over time.


8. Let Philosophical Models of Mind Shape Architecture

Different views of the mind suggest different architectures.

8.1 BDI: beliefs, desires, intentions

Adopt a BDI model:

Lifecycle:

  1. Adopt an intention when a goal and a viable plan appear.
  2. Stick with it while it remains achievable.
  3. Reconsider or drop it when new evidence conflicts.

8.2 Bounded rationality: “good enough” over perfect

Accept that the agent is limited in compute and information:

8.3 Enactive/embodied flavor (for tools & robots)

If your agent acts in the world (robots, APIs that change state):


9. A Practical Checklist for Philosophically‑Informed Agents

When designing or reviewing an agent system, you can ask:

  1. Beliefs

    • What does this agent currently believe?
    • Where are those beliefs stored?
    • How are they updated with new evidence?
  2. Knowledge

    • When does it treat something as “known” rather than just “believed”?
    • What are the thresholds and validation steps?
  3. Epistemic norms

    • Does it show humility, charity, coherence, and calibration?
    • Does it seek robustness before acting on high‑stakes beliefs?
  4. Moral framework

    • Which hard rules constrain its actions?
    • How does it estimate utility and risk?
    • Are there explicit risk caps?
  5. Sources

    • How does it model reliability of tools and APIs?
    • How does it handle conflicts and update trust?
  6. Belief revision

    • Can it retract false beliefs?
    • Can it propagate those corrections through its reasoning?

Where to Go Next

If you want to push this further, some natural follow‑ups are:

Philosophy doesn’t replace engineering here—it gives you a language and a set of constraints that can make your agents more reliable, inspectable, and aligned with human expectations.


Edit page
Share this post on:

Previous Post
Making use of the Caps Lock key