Engineering Notes

Engineering Notes

Thoughts and Ideas on AI by Muthukrishnan

Beam Search and Parallel Decoding for Multiple Solution Paths in AI Agents

02 Dec 2025

Beam search is a heuristic search algorithm that explores a graph by expanding the most promising nodes in a limited set. Unlike breadth-first search (which explores all nodes at each level) or greedy best-first search (which follows only the single best path), beam search maintains a fixed-size set of the k best partial solutions at each step. It is an elegant compromise between exhaustive exploration and greedy decision-making.

Concept Introduction

Key parameters:

The algorithm trades completeness for efficiency: no guarantee of an optimal solution, but polynomial time complexity.

Historical & Theoretical Context

Beam search emerged in the 1970s from the speech recognition community, where researchers needed efficient methods to decode acoustic signals into text. The landmark paper by Lowerre (1976) introduced beam search for the HARPY speech understanding system at Carnegie Mellon.

The name comes from the analogy to a flashlight beam: you illuminate (keep in memory) only a focused subset of all possible paths, rather than the entire search space.

Evolution:

Theoretical foundation: Beam search is a form of bounded rationality: making the best decision possible within computational constraints. It connects to:

Algorithms & Math

Basic Beam Search Algorithm

def beam_search(initial_state, beam_width, max_steps, score_fn, expand_fn, is_goal):
    """
    Args:
        initial_state: Starting state
        beam_width: Number of hypotheses to maintain (k)
        max_steps: Maximum search depth
        score_fn: Function that scores a state (higher is better)
        expand_fn: Function that generates successor states
        is_goal: Function to check if state is a goal

    Returns:
        Best complete path found
    """
    # Initialize beam with starting state
    beam = [(score_fn(initial_state), [initial_state])]

    for step in range(max_steps):
        candidates = []

        # Expand each hypothesis in current beam
        for score, path in beam:
            current_state = path[-1]

            # Check if we've reached a goal
            if is_goal(current_state):
                return path

            # Generate successors
            for next_state in expand_fn(current_state):
                new_path = path + [next_state]
                new_score = score_fn(next_state)
                candidates.append((new_score, new_path))

        # Keep only top-k candidates (the beam)
        beam = sorted(candidates, key=lambda x: x[0], reverse=True)[:beam_width]

        if not beam:
            break

    # Return best path found
    return max(beam, key=lambda x: x[0])[1] if beam else None

Scoring Functions

The quality of beam search depends critically on the scoring function. Common approaches:

1. Log-probability (for sequence generation):

score(sequence) = Σ log P(token_i | token_1...token_{i-1})

2. Length-normalized score (prevents bias toward short sequences):

score(sequence) = (1/|sequence|^α) × Σ log P(token_i | context)

where α ∈ [0.6, 0.8] typically.

3. Heuristic-guided (for planning):

score(state) = g(state) + h(state)

where g = cost so far, h = heuristic estimate to goal (like A*)

Design Patterns & Architectures

Beam search appears in several agent architecture patterns. The Generator-Ranker Pipeline expands candidate outputs and prunes them in order:

[LLM Generator] → [Beam Expansion] → [Scoring/Ranking] → [Pruning] → [Selection]

Parallel Reasoning Chains distribute execution across independent paths:

Input → [Split into k reasoning paths] → [Execute in parallel] → [Aggregate/Vote] → Output

Iterative Refinement with Beams feeds scored outcomes back into the loop:

graph TD
    A[Initial Query] --> B[Generate k Plans]
    B --> C[Execute Top k]
    C --> D[Score Outcomes]
    D --> E{Good Enough?}
    E -->|No| F[Refine Top k Plans]
    F --> C
    E -->|Yes| G[Return Best]
  

This pattern connects with event-driven architecture (each beam expansion is an event), the planner-executor loop (each beam represents a different plan), and ensemble methods (beams as diverse hypotheses).

Practical Application

import anthropic
from typing import List, Tuple
import asyncio

class BeamSearchAgent:
    def __init__(self, client: anthropic.Anthropic, beam_width: int = 3):
        self.client = client
        self.beam_width = beam_width

    def score_plan(self, plan: str, task: str) -> float:
        """Score a plan using LLM self-evaluation"""
        prompt = f"""Task: {task}

Proposed plan:
{plan}

Rate this plan's quality on a scale of 0-100 considering:
- Feasibility
- Completeness
- Efficiency

Return only a number."""

        response = self.client.messages.create(
            model="claude-sonnet-4-5-20250929",
            max_tokens=10,
            messages=[{"role": "user", "content": prompt}]
        )

        try:
            return float(response.content[0].text.strip())
        except:
            return 0.0

    def expand_plan(self, partial_plan: str, task: str, step: int) -> List[str]:
        """Generate next step alternatives for a partial plan"""
        prompt = f"""Task: {task}

Current plan:
{partial_plan}

Generate 3 different possible next steps to continue this plan.
Return them as a numbered list."""

        response = self.client.messages.create(
            model="claude-sonnet-4-5-20250929",
            max_tokens=500,
            messages=[{"role": "user", "content": prompt}]
        )

        # Parse the response to extract alternatives
        text = response.content[0].text
        alternatives = []
        for line in text.split('\n'):
            if line.strip().startswith(('1.', '2.', '3.')):
                alternatives.append(line.split('.', 1)[1].strip())

        return [f"{partial_plan}\nStep {step}: {alt}" for alt in alternatives]

    def beam_search_plan(self, task: str, max_steps: int = 5) -> str:
        """Use beam search to find best plan"""
        # Initialize beam
        initial_prompt = f"Task: {task}\n\nPlan:\n"
        beam = [(0.0, initial_prompt)]

        for step in range(1, max_steps + 1):
            candidates = []

            for score, partial_plan in beam:
                # Expand each hypothesis
                expansions = self.expand_plan(partial_plan, task, step)

                for expansion in expansions:
                    new_score = self.score_plan(expansion, task)
                    candidates.append((new_score, expansion))

            # Keep top-k
            beam = sorted(candidates, key=lambda x: x[0], reverse=True)[:self.beam_width]

            print(f"\n=== Step {step} - Top {len(beam)} Plans ===")
            for i, (score, plan) in enumerate(beam, 1):
                print(f"\nPlan {i} (score: {score:.1f}):\n{plan}\n")

        # Return best plan
        return max(beam, key=lambda x: x[0])[1]


# Usage example
if __name__ == "__main__":
    client = anthropic.Anthropic()
    agent = BeamSearchAgent(client, beam_width=3)

    task = "Build a web scraper that monitors prices across 5 e-commerce sites"
    best_plan = agent.beam_search_plan(task, max_steps=4)

    print("\n=== FINAL BEST PLAN ===")
    print(best_plan)

Example 2: Integrating with LangGraph

from langgraph.graph import StateGraph, END
from typing import TypedDict, List, Annotated
import operator

class BeamState(TypedDict):
    input: str
    beams: Annotated[List[Tuple[float, str]], operator.add]
    current_step: int
    max_steps: int
    beam_width: int

def expand_beams(state: BeamState) -> BeamState:
    """Expand current beams with new candidates"""
    new_candidates = []

    for score, path in state["beams"][-state["beam_width"]:]:
        # Generate alternatives (simplified)
        alternatives = generate_next_actions(path, state["input"])

        for alt in alternatives:
            new_path = f"{path}{alt}"
            new_score = score_path(new_path, state["input"])
            new_candidates.append((new_score, new_path))

    # Prune to top-k
    top_k = sorted(new_candidates, key=lambda x: x[0], reverse=True)[:state["beam_width"]]

    return {
        **state,
        "beams": top_k,
        "current_step": state["current_step"] + 1
    }

def should_continue(state: BeamState) -> str:
    """Determine if we should continue expanding"""
    if state["current_step"] >= state["max_steps"]:
        return "end"
    return "expand"

# Build graph
workflow = StateGraph(BeamState)
workflow.add_node("expand", expand_beams)
workflow.set_entry_point("expand")
workflow.add_conditional_edges("expand", should_continue, {
    "expand": "expand",
    "end": END
})

beam_agent = workflow.compile()

Latest Developments & Research

Recent Breakthroughs (2023-2025)

1. Self-Evaluation Guided Beam Search (2023)

2. Diverse Beam Search (2024)

3. Contrastive Decoding with Beams (2024)

4. Beam Search for Chain-of-Thought (2024-2025)

Open Problems:

Notable Benchmarks

Cross-Disciplinary Insight

Connection to Neuroscience: Predictive Processing

The brain doesn’t commit to single interpretations. It maintains multiple hypotheses about sensory input. The predictive processing framework suggests the brain uses a form of “mental beam search”:

This parallels beam search’s maintain-score-prune cycle. The beam width might correspond to working memory capacity (Miller’s 7±2 items).

Connection to Economics: Portfolio Theory

Beam search is analogous to portfolio diversification:

Just as investors maintain multiple assets to hedge risk, beam search maintains multiple hypotheses to hedge against early mistakes.

Connection to Evolution: Parallel Descent

Evolution explores multiple “solutions” (species) in parallel through populations. Natural selection is the “scoring function,” and extinction is pruning. Beam search formalizes this:

Daily Challenge: Build a Beam Search Code Generator

Task: Implement a coding agent that uses beam search to generate Python functions, testing each beam against unit tests.

Requirements:

  1. Start with a function signature and docstring
  2. Generate k=3 different implementations at each step
  3. Score each based on passing test cases
  4. Continue for 3 expansion steps
  5. Return the best working implementation

Starter code:

def code_beam_search(
    function_spec: str,
    test_cases: List[Tuple[List, Any]],
    beam_width: int = 3
) -> str:
    """
    Use beam search to generate working code.

    Args:
        function_spec: Function signature + docstring
        test_cases: List of (input_args, expected_output)
        beam_width: Number of implementations to explore

    Returns:
        Best implementation found
    """
    # TODO: Implement this!
    pass

# Example usage:
spec = '''
def fibonacci(n: int) -> int:
    """Return the nth Fibonacci number."""
    pass
'''

tests = [
    ([0], 0),
    ([1], 1),
    ([5], 5),
    ([10], 55)
]

best_code = code_beam_search(spec, tests)
print(best_code)

Extension challenges:

Time estimate: 20-30 minutes

References & Further Reading

Foundational Papers

  1. Lowerre, B. (1976). “The HARPY Speech Recognition System.” CMU Computer Science Department.
  2. Sutskever et al. (2014). “Sequence to Sequence Learning with Neural Networks.” NIPS.
  3. Vijayakumar et al. (2018). “Diverse Beam Search for Improved Description of Complex Scenes.” AAAI.

Recent Research

  1. Madaan et al. (2023). “Self-Refine: Iterative Refinement with Self-Feedback.” NeurIPS.
  2. Li et al. (2024). “Contrastive Decoding Improves Reasoning in Large Language Models.” arXiv.
  3. Xie et al. (2024). “Self-Evaluation Guided Beam Search for Reasoning.” arXiv.

Practical Resources

  1. Hugging Face Transformers Beam Search: https://huggingface.co/blog/how-to-generate
  2. LangChain Beam Search Example: https://github.com/langchain-ai/langchain/discussions/beam-search
  3. Anthropic Claude with Beam Decoding: https://docs.anthropic.com/claude/docs (see parallel sampling)

Blog Posts & Tutorials

  1. “Beam Search Demystified” - Jay Alammar: https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
  2. “Modern Beam Search for LLMs” - Lilian Weng: https://lilianweng.github.io/posts/2023-01-10-decoding/

GitHub Implementations

  1. OpenNMT Beam Search: https://github.com/OpenNMT/OpenNMT-py
  2. AllenNLP Beam Search: https://github.com/allenai/allennlp
  3. Minimal Beam Search (educational): https://github.com/google-research/language/tree/master/language/search

● Intelligence at Every Action

AI Native
Project Management

Stop using tools that bolt on AI as an afterthought. Jovis is built AI-first — smart routing, proactive monitoring, and intelligent workflows from the ground up.