Engineering Notes

Engineering Notes

Thoughts and Ideas on AI by Muthukrishnan

Opponent Modeling and Theory of Mind in Multi-Agent Systems

13 Mar 2026

When two chess grandmasters face each other, the game happening in their heads is deeper than the one on the board. Each is modeling not just the position, but the other’s plans, tendencies, and even their model of the other’s model. This recursive reasoning — “I think that you think that I think…” — is the core of opponent modeling and Theory of Mind in AI agents.

1. Concept Introduction

Simple Explanation

Opponent modeling is the ability of an agent to build an internal representation of another agent: what they want, what they know, what they are likely to do next. In multi-agent settings, ignoring other agents and acting as if the world is static is almost always suboptimal. The world’s most effective players — human or AI — exploit what they know about their counterparts.

Theory of Mind (ToM) goes a step further. In cognitive science, ToM is the capacity to attribute mental states (beliefs, desires, intentions) to others and use those attributed states to predict behavior. A child passes the “false belief test” around age 4 — they understand that someone else can hold a belief that differs from reality. For agents, ToM means reasoning about what another agent believes to be true, not just what is true.

Technical Detail

Formally, suppose agent $i$ interacts with agent $j$. Instead of observing $j$’s true policy $\pi_j$, agent $i$ must infer or approximate it from observations. Define:

This inference can be Bayesian: maintain a posterior over $j$’s type $\theta_j$ (parameters of their policy), and compute:

$$P(\theta_j \mid \tau_t) \propto P(\tau_t \mid \theta_j) \cdot P(\theta_j)$$

With a learned model, agent $i$ then best-responds to the posterior-weighted policy:

$$\pi_i^*(a \mid s) = \arg\max_{\pi_i} \mathbb{E}_{\theta_j \sim P(\cdot \mid \tau_t)} \left[ V^{\pi_i, \hat{\pi}_j(\theta_j)}(s) \right]$$

2. Historical & Theoretical Context

The idea of modeling other minds has ancient philosophical roots, but modern computational treatments began in the late 1980s and 1990s.

Recursive reasoning was formalized in Harsanyi’s type spaces (1967–68) for Bayesian games, where agents have private types encoding their preferences and beliefs — including beliefs about others’ beliefs. This led to the notion of k-level thinking (Nagel, 1995): a level-0 agent plays randomly, level-1 best-responds to level-0, level-2 best-responds to level-1, and so on.

In AI, early opponent modeling appeared in game-playing programs. Jonathan Schaeffer’s checkers program Chinook (1994) used opponent statistics to adapt play. The poker community developed opponent classification systems throughout the 2000s, culminating in probabilistic modeling in programs like Polaris and eventually Libratus (2017).

The term “Theory of Mind” entered AI formally through the developmental psychology literature. Premack and Woodruff coined the phrase in 1978 studying chimpanzees. The false belief task (Wimmer & Perner, 1983) became the standard test. In AI, ToM benchmarks emerged as large language models grew sophisticated enough to warrant evaluation.

3. Algorithms & Math

K-Level Reasoning

K-level thinking provides a tractable approximation to full recursive reasoning:

procedure k_level_reasoning(game, k):
    if k == 0:
        return uniform_random_policy()

    opponent_model = k_level_reasoning(game, k - 1)
    return best_response(game, opponent_model)

In practice, empirical studies suggest most humans reason at level 1–2, and equilibrium play corresponds to $k \to \infty$.

Bayesian Opponent Modeling

The agent maintains a type distribution over possible opponent policies:

$$\theta_j \in \{\text{aggressive}, \text{cooperative}, \text{random}, \ldots\}$$

At each step, it updates using the likelihood of observed actions:

$$P(\theta_j \mid a_j^{1:t}, s^{1:t}) \propto \prod_{\tau=1}^{t} \pi_j(a_j^\tau \mid s^\tau; \theta_j) \cdot P(\theta_j)$$

Then selects the best action given the current belief:

$$a_i^* = \arg\max_{a_i} \sum_{\theta_j} P(\theta_j \mid \tau_t) \cdot Q(s, a_i, \hat{\pi}_j(\theta_j))$$

Neural ToM (NToM)

Tom Griffiths and colleagues proposed Neural Theory of Mind models where a neural network $f_\phi$ predicts another agent’s next action from their observed history:

$$\hat{a}_j^{t+1} = f_\phi(\tau_j^{1:t}, s^{t+1})$$

The agent conditions its own policy on this prediction. The architecture often separates beliefs (what the opponent observes) from desires (their objective), mirroring the belief-desire-intention (BDI) architecture.

4. Design Patterns & Architectures

Opponent modeling integrates into agents as a meta-reasoning layer sitting between perception and action selection:

graph LR
    O[Observations] --> TM[Theory of Mind Module]
    TM --> OM[Opponent Model]
    OM --> BR[Best Response Planner]
    BR --> A[Action]
    A --> W[World]
    W --> O
    TM --> BU[Belief Update]
    BU --> TM
  

Pattern: Model-as-Context. The inferred opponent model becomes part of the agent’s context or state. In LangGraph, this might be a dedicated memory node storing structured opponent profiles that are retrieved before each decision.

Pattern: Separate Inference and Decision. Decouple the opponent inference model (trained offline on human data or self-play) from the real-time decision policy. The inference model runs as a tool call; the policy conditions on its output.

Pattern: Hierarchical ToM. Maintain multiple levels — a fast heuristic model for within-game decisions and a slow Bayesian model updated across games for long-run adaptation.

5. Practical Application

Here is a simple negotiation agent that uses opponent modeling to adapt its offers. It tracks the opponent’s concession rate and infers their reservation price:

from anthropic import Anthropic
from dataclasses import dataclass, field

@dataclass
class OpponentModel:
    offers: list[float] = field(default_factory=list)

    def update(self, offer: float):
        self.offers.append(offer)

    def concession_rate(self) -> float:
        if len(self.offers) < 2:
            return 0.0
        deltas = [self.offers[i] - self.offers[i-1] for i in range(1, len(self.offers))]
        return sum(deltas) / len(deltas)

    def estimated_reservation(self) -> float:
        """Extrapolate where concessions will stop."""
        if len(self.offers) < 2:
            return self.offers[-1] if self.offers else 50.0
        rate = self.concession_rate()
        if rate >= 0:  # not conceding
            return self.offers[-1]
        # geometric decay: assume rate halves each round
        remaining = self.offers[-1] + rate / (1 - 0.5)
        return max(0, remaining)


class NegotiationAgent:
    def __init__(self, my_reservation: float, my_aspiration: float):
        self.my_reservation = my_reservation
        self.my_aspiration = my_aspiration
        self.opponent_model = OpponentModel()
        self.client = Anthropic()
        self.my_offers: list[float] = []
        self.round = 0

    def observe_opponent(self, opponent_offer: float):
        self.opponent_model.update(opponent_offer)

    def make_offer(self) -> float:
        self.round += 1
        opp_reservation = self.opponent_model.estimated_reservation()
        opp_concession = self.opponent_model.concession_rate()

        # Build context for Claude
        context = f"""
You are a negotiation agent. Current state:
- My reservation price (minimum acceptable): {self.my_reservation}
- My aspiration (ideal outcome): {self.my_aspiration}
- My previous offers: {self.my_offers}
- Opponent's offers so far: {self.opponent_model.offers}
- Estimated opponent reservation price: {opp_reservation:.1f}
- Opponent concession rate per round: {opp_concession:.2f}

Round {self.round}. Based on this opponent model, propose a single numeric offer
that balances aggressiveness with closing probability.
Reply with ONLY a number.
"""
        response = self.client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=16,
            messages=[{"role": "user", "content": context}]
        )
        offer = float(response.content[0].text.strip())
        offer = max(self.my_reservation, min(self.my_aspiration, offer))
        self.my_offers.append(offer)
        return offer


# Simulate a negotiation
buyer = NegotiationAgent(my_reservation=60, my_aspiration=70)
seller_offers = [90, 85, 82, 80]  # Seller conceding slowly

for seller_offer in seller_offers:
    buyer.observe_opponent(seller_offer)
    my_offer = buyer.make_offer()
    print(f"Seller: {seller_offer:.1f} | Estimated seller floor: "
          f"{buyer.opponent_model.estimated_reservation():.1f} | My offer: {my_offer:.1f}")

6. Comparisons & Tradeoffs

ApproachStrengthWeakness
K-level reasoningComputationally cheap, no learning requiredBrittle when opponent doesn’t fit assumed level
Bayesian type inferencePrincipled uncertainty handlingRequires hand-crafted type space
Neural opponent modelLearns complex behaviors from dataData hungry, can overfit to specific opponents
LLM-based ToMFlexible, generalizes across contextsSlow, expensive, may hallucinate mental states
No opponent modelingSimple, robust against adversarial adaptationLeaves value on the table in repeated games

A key limitation: opponent models can be exploited. If your opponent knows you are modeling them, they can deliberately mislead your model — “teaching” you a false belief about their policy, then switching. This is the realm of deceptive signaling and counter-modeling, and it spirals into an arms race of nested beliefs.

7. Latest Developments & Research

ToM in LLMs (2023–2025) has generated significant debate. Kosinski (2023) claimed GPT-4 shows “theory of mind capabilities,” sparking controversy. Subsequent work (Ullman, 2023; Shapira et al., 2023) showed LLMs fail systematic variations of classic ToM tasks, suggesting pattern matching rather than genuine mental state reasoning. The field now distinguishes behavioral ToM (passing tests) from mechanistic ToM (actually representing beliefs).

ToM-Bench (2024) introduced a systematic benchmark covering eight ToM abilities across 6,000+ question-answer pairs, revealing that even frontier models lag behind humans on higher-order belief tasks.

Machine Social Intelligence (Zhu et al., 2024) proposed training LLM agents with explicit belief-state representations, improving performance on multi-step ToM tasks and negotiation benchmarks over pure prompting.

PASTA (Opponent Modeling via Planning and Self-Play, 2024) combined Monte Carlo Tree Search with neural opponent models, achieving superhuman performance in multi-player imperfect-information games beyond poker.

Open questions remain around non-stationary opponents (how quickly should you update your model?), adversarial robustness (what if the opponent is actively deceptive?), and scalability to many-agent settings where modeling every opponent separately becomes intractable.

8. Cross-Disciplinary Insight

Opponent modeling is deeply connected to economics through the theory of mechanism design and signaling games. In signaling games (Spence, 1973), a sender with private information chooses a costly signal, and the receiver updates their belief about the sender’s type. This is ToM in action — the sender models the receiver’s inference process, and the receiver models the sender’s incentives.

In neuroscience, ToM maps to the temporoparietal junction (TPJ) and medial prefrontal cortex (mPFC), which activate when humans reason about others’ mental states. Computational models of these regions (e.g., OSEM — Online Structured Emotion Model) suggest the brain runs fast, approximate simulations of other agents using generative models — exactly the architecture proposed in neural ToM systems.

The connection to Bayesian brain theory is striking: just as the brain predicts sensory inputs to minimize surprise (active inference), it may predict other agents’ actions to minimize social prediction error.

9. Daily Challenge

Build a Bluffing Detector

Implement a simple poker-inspired scenario where one agent bluffs (bets high with a weak hand) at some rate $p$. Build an opponent model that:

  1. Tracks bet sizes relative to hand strength (revealed at showdown)
  2. Estimates the opponent’s bluff frequency using a Beta-Binomial conjugate model:
$$p_\text{bluff} \sim \text{Beta}(\alpha + \text{bluffs}, \beta + \text{value bets})$$
  1. Uses this estimate to decide whether to call or fold a large bet

Extension: What happens to your estimate when the opponent knows you’re tracking them and deliberately plays mixed strategies? At what sample size does your estimate become reliable? Explore the exploration-exploitation tradeoff in opponent modeling.

10. References & Further Reading

Papers

Books & Surveys

Code & Repositories


Key Takeaways

  1. Ignore other agents at your peril — in any multi-agent setting, a static world assumption loses value against adaptive opponents
  2. K-level reasoning is surprisingly powerful and computationally cheap; humans typically stop at level 1–2
  3. Bayesian opponent models offer principled uncertainty but require a predefined type space
  4. Neural ToM learns opponent representations end-to-end but is data hungry
  5. LLMs show behavioral ToM but may lack mechanistic understanding of belief states — tests are easy to game
  6. Deception and counter-modeling create an arms race; robust agents need to account for adversarial opponents who model them back
  7. The brain solves ToM through fast generative simulations — a design principle worth borrowing for agent architectures
● Intelligence at Every Action

AI Native
Project Management

Stop using tools that bolt on AI as an afterthought. Jovis is built AI-first — smart routing, proactive monitoring, and intelligent workflows from the ground up.