Skip to content
Cogitate
Go back

Building Smarter AI Agents With Ideas From Philosophy

Updated:
| Björn Roberg, GPT-5.1 Edit page

When we talk about “AI agents,” we usually jump straight to tools, APIs, and model prompts. But under the hood, any useful agent is constantly doing something deeply philosophical: forming beliefs, weighing evidence, making decisions, and following (or breaking) rules.

This post explores philosophically informed agent design by showing how epistemic frameworks for agent design (drawn from epistemology and ethics) can make AI behavior more reliable, inspectable, and aligned. The aim is to keep things concrete and implementable, not just hand‑wavy.


1. Make the Agent’s “Beliefs” Explicit

Instead of treating your agent as a black box that just spits out text, treat it as something that has beliefs about the world.

Store beliefs, not just chat logs

Don’t just keep a raw history of messages. Maintain a structured “world model” that captures what the agent thinks is true and how sure it is.

You can think of a simple BeliefStore with entries like:

PropertyPurpose
propositionWhat the belief is about (“user lives in Berlin”)
confidenceHow sure the agent is (0.2–0.95)
sourceWhere this belief came from (user message, web tool, model prior, etc.)
last_updatedWhen it was last revised

Keep track of “because…”

For important beliefs, store why the agent believes them:

That way you can later inspect whether the agent “knows” something in a robust way or is just repeating a guess with too much confidence.

Embrace uncertainty

Instead of pretending the agent is always certain, make uncertainty first‑class:

Confidence LevelAgent BehaviorExample
<0.6Ask user or gather more info”I’m not sure about…“
0.6-0.8Proceed but state assumptions”It’s likely that…”
>0.8Act, log failure modes”Given evidence, very likely…”

This is essentially taking ideas from formal epistemology (Bayesian/evidential reasoning) and applying them to your knowledge store.


2. Turn Epistemic Norms Into Code Rules

Philosophy has a lot to say about what makes for good beliefs. You can turn those norms into concrete coding constraints.

NormPrincipleCode Implementation
Epistemic humilityDon’t overclaimHedging policy based on confidence
Principle of charityInterpret generouslyRephrase + consistency check
RobustnessReserve “knows” for stable beliefsBelief promotion through validation

2.1 Epistemic humility (avoid overclaiming)

# Pseudocode: Hedging policy
if confidence < 0.6:
    wording = "I'm not sure, but one possibility is..."
elif 0.6 <= confidence <= 0.8:
    wording = "It's likely that..."
else:
    wording = "Given the information, it's very likely that..."

2.2 Principle of charity (interpret users generously)

When the user says something ambiguous, default to the most reasonable interpretation instead of the weirdest.

Implementation ideas:

2.3 Don’t call it “knowledge” unless it’s robust

Reserve “the agent knows X” for beliefs that:

You can model this as a promotion: beliefs get “upgraded” to known only after cross‑validation.


3. Make Decisions With Moral Structure, Not Just Heuristics

When an agent chooses actions, it’s implicitly taking a moral stance. You can make that structure explicit using ideas from moral philosophy and decision theory.

Moral FrameworkConsidersImplementation
ConsequentialistBenefits and harmsRisk budget, utility estimation
Deontic (Rules)Hard constraintsHard filters, rule violations thrown away
Virtue-basedCharacter and habitsHonesty, fairness, carefulness heuristics

3.1 Consequences: how good or bad is each action?

Build a consequentialist layer that:

3.2 Rules: hard lines the agent must not cross

Add a rule-based (deontic) layer:

These three layers work together: rules eliminate obviously bad plans, consequences score remaining options, virtues tie-break.

3.3 Traits: build in good “habits”

Use virtue-inspired heuristics to give the agent character:

This is like mixing utilitarian (outcomes), deontological (rules), and virtue‑ethics (character) checks into your decision loop.


4. Reason Explicitly About Knowledge in Planning

Modal logic and planning theory sound fancy, but you can use simple versions in code.

4.1 Separate “believes” from “knows”

Internally, distinguish:

Aspectbelieves(P)knows(P)
ConfidenceProbableHighly confirmed
StabilityMay changeWell-supported
UsageSuggest with hedgingUse as trusted premise
EvidenceSingle source possibleMultiple sources

Then define policies like:

4.2 Track what the user knows

For user‑facing or multi‑agent systems, track:

Use this to:

User Knowledge LevelAgent Response StyleExample
Novice (believes user doesn’t know basics)Full explanations, define terms”A REST API is a way for programs to communicate…”
Intermediate (believes user knows basics)Skip basics, explain context”The API endpoint returns JSON with…”
Expert (believes user knows domain well)Technical detail, no fluff”POST /api/v2/users with OAuth2 bearer token”

4.3 Treat plan checks like mini proofs

Before executing a plan, treat validation like a lightweight proof:

If any of these checks fail, revise or discard the plan.


5. Take Sources and Testimony Seriously

Whenever your agent calls tools or APIs, it’s essentially trusting testimony. Epistemology of testimony has direct design implications.

5.1 Model source reliability

For each tool or data source, store:

When sources conflict:

Source AttributeHigh ReliabilityLow Reliability
Historical accuracy>85% in domain<50% or unknown
Domain expertiseSpecialist toolGeneric search
Conflict handlingConsistent priorityNo rules
TransparencyClear metadataBlack box

5.2 Handle “defeaters” (when evidence undercuts a source)

If new evidence suggests a source is unreliable (bug, outage, bad past answers):

5.3 Reflect testimonial chains

If a tool aggregates other tools (e.g., web search across many sites), let that show up in the justification:

This helps both debugging and user‑facing explanations.


6. Define Your Core Concepts Clearly

Borrow from conceptual engineering: nail down what your key labels actually mean, operationally.

6.1 Clarify “task”, “goal”, “success”

Give working definitions like:

ConceptDefinitionTestabilityExample
TaskMinimal executable unitPass/fail checkFetch calendar
GoalDesired end stateObservable conditionPlan stress-free day
SuccessVerifiable completionUser confirmationUser validated plan

6.2 Make “safety” and “harm” testable

Instead of vague labels:

CategoryObservable SignalDetector
Health/SafetyMedical topic + actionDomain classifier
FinancialMoney/investment decisionKeyword + risk level
PrivacyPII or sensitive dataEntity recognition
LegalRegulatory keywordsPolicy database

6.3 Make concepts revisable

As you refine your understanding (e.g., of what counts as “high‑stakes”):


7. Debug Failures as Epistemic Bugs

Don’t just patch failures with ad‑hoc rules. Treat them as evidence about the agent’s “theory of mind.”

7.1 Classify failures philosophically

Examples:

Failure TypeRoot CausePhilosophyFix
MisinterpretationLanguage/communicationSemanticsMeaning check step
OverconfidenceWeak evidenceEpistemologyCalibration + thresholds
Side effectsIncomplete modelEthicsSide-effect estimation
Constraint violationRule not appliedDeontologyHard filter in planner

7.2 Add a guardrail per failure type

For each failure class, add a targeted fix:

7.3 Maintain an “epistemic changelog”

When you change the system, describe it in knowledge/ethics terms, not just code terms. For example:

Example epistemic changelog entry:

## 2026-01-15: Strengthened Evidence Requirements for Financial Advice

**Change**: Raised confidence threshold from 0.6 to 0.8 for financial recommendations

**Reasoning**: After 3 incidents of overconfident suggestions based on single-source data,
we identified an epistemic failure (insufficient evidence gathering before high-stakes advice)

**Impact**: Agent now requires multiple independent sources before suggesting investment actions.
Fallback: if only one source available, explicitly hedge and recommend user consult advisor.

**Norm updated**: Epistemic humility in high-stakes domains

This makes it easier to see what kind of agent you’re evolving over time.


8. Let Philosophical Models of Mind Shape Architecture

Different views of the mind suggest different architectures.

8.1 BDI: beliefs, desires, intentions

Adopt a BDI model:

AspectBDI ModelSimple Reactive
BeliefsExplicit world modelImplicit in state
DesiresExplicit goalsHardcoded behaviors
IntentionsCommitted plansImmediate responses
Failure modePlan reconsiderationBrittle responses

Lifecycle:

  1. Adopt an intention when a goal and a viable plan appear.
  2. Stick with it while it remains achievable.
  3. Reconsider or drop it when new evidence conflicts.

8.2 Bounded rationality: “good enough” over perfect

Accept that the agent is limited in compute and information:

8.3 Enactive/embodied flavor (for tools & robots)

If your agent acts in the world (robots, APIs that change state):

Example: A robot arm learning object properties through interaction:

# Enactive learning scenario
class RobotArm:
    def learn_object_weight(self, object_id):
        # Initial belief: uncertain
        belief = BeliefStore.get(f"weight_{object_id}")

        if belief.confidence < 0.7:
            # Embodied action: physically interact to learn
            measured_weight = self.lift_and_measure(object_id)

            # Update belief through action
            BeliefStore.update(
                proposition=f"weight_{object_id}",
                value=measured_weight,
                confidence=0.9,
                source="direct_measurement"
            )

The agent learns by doing, not just by being told.


9. A Practical Checklist for Philosophically‑Informed Agents

When designing or reviewing an agent system, you can ask:

  1. Beliefs

    • What does this agent currently believe?
    • Where are those beliefs stored?
    • How are they updated with new evidence?
  2. Knowledge

    • When does it treat something as “known” rather than just “believed”?
    • What are the thresholds and validation steps?
  3. Epistemic norms

    • Does it show humility, charity, coherence, and calibration?
    • Does it seek robustness before acting on high‑stakes beliefs?
  4. Moral framework

    • Which hard rules constrain its actions?
    • How does it estimate utility and risk?
    • Are there explicit risk caps?
  5. Sources

    • How does it model reliability of tools and APIs?
    • How does it handle conflicts and update trust?
  6. Belief revision

    • Can it retract false beliefs?
    • Can it propagate those corrections through its reasoning?

Where to Go Next

If you want to push this further, some natural follow‑ups are:

Philosophy doesn’t replace engineering here—it gives you a language and a set of constraints that can make your agents more reliable, inspectable, and aligned with human expectations.


Edit page
Share this post on:

Previous Post
MarkyMarkov - Markov Chain-Based Code Guidance for LLM Agents and Humans Alike
Next Post
Notes on agentic applications in business processes