Engineering Notes

Engineering Notes

Thoughts and Ideas on AI by Muthukrishnan

Building an Agentic RAG Pipeline with LangGraph

24 Dec 2025

Traditional RAG (Retrieval-Augmented Generation) follows a simple pattern: embed the query, find similar documents, generate an answer. But this “naive RAG” approach breaks down when queries are complex, ambiguous, or require information from multiple sources.

Agentic RAG solves this by letting the LLM make decisions about retrieval. Instead of a fixed pipeline, the LLM can:

In this article, we’ll build an Agentic RAG system using LangGraph.

The Problem with Naive RAG

graph LR
    A[Query] --> B[Embed]
    B --> C[Retrieve]
    C --> D[Generate]
    D --> E[Answer]
  

Naive RAG has a fixed flow. If retrieval returns irrelevant documents, the generator hallucinates. If the query is ambiguous, retrieval fails silently. There’s no feedback loop.

The Agentic RAG Solution

graph TD
    A[Query] --> B[Analyze Query]
    B --> C[Retrieve]
    C --> D{Relevant?}
    D -->|No| E[Rewrite Query]
    E --> C
    D -->|Yes| F[Generate]
    F --> G[Answer]
  

Agentic RAG adds decision points. The LLM evaluates retrieval quality and can loop back to try different queries.

Building the Pipeline

Step 1: Define the State

LangGraph requires an explicit state schema. For Agentic RAG, we track the query, search attempts, retrieved documents, and the final answer.

from typing import TypedDict, Annotated, Literal
from langgraph.graph import StateGraph, END
import operator

class RAGState(TypedDict):
    query: str                                          # Original user query
    search_queries: Annotated[list[str], operator.add]  # Generated search queries
    documents: Annotated[list[dict], operator.add]      # Retrieved documents
    answer: str                                         # Final generated answer
    needs_more_info: bool                               # Flag for retrieval loop
    iteration: int                                      # Prevent infinite loops

The Annotated type with operator.add tells LangGraph to accumulate values instead of replacing them.

Step 2: Query Analyzer

The first agent analyzes the query and decides how to search. For complex queries, it might generate multiple sub-queries.

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

def query_analyzer(state: RAGState) -> dict:
    """Analyze and decompose the user query."""
    query = state["query"]

    messages = [
        {"role": "user", "content": f"""Analyze this query and generate 1-3 search queries that would help answer it. Return only the queries, one per line.

Query: {query}"""}
    ]
    response = llm.invoke(messages)

    search_queries = [q.strip() for q in response.content.split("\n") if q.strip()]
    return {"search_queries": search_queries}

Step 3: Retriever

The retriever searches your vector store. Here’s an example using ChromaDB:

import chromadb
from sentence_transformers import SentenceTransformer

# Initialize (in production, load existing collection)
client = chromadb.Client()
collection = client.get_or_create_collection("documents")
embedder = SentenceTransformer("all-MiniLM-L6-v2")

def retriever(state: RAGState) -> dict:
    """Retrieve documents for each search query."""
    search_queries = state["search_queries"]
    all_docs = []

    for query in search_queries:
        # Embed and search
        embedding = embedder.encode(query).tolist()
        results = collection.query(
            query_embeddings=[embedding],
            n_results=3
        )

        for i, doc in enumerate(results["documents"][0]):
            all_docs.append({
                "content": doc,
                "metadata": results["metadatas"][0][i],
                "query": query
            })

    # Deduplicate by content
    seen = set()
    unique_docs = []
    for doc in all_docs:
        if doc["content"] not in seen:
            seen.add(doc["content"])
            unique_docs.append(doc)

    return {"documents": unique_docs}

Step 4: Relevance Checker

This is the key agentic component. The LLM evaluates whether retrieved documents can answer the query.

def relevance_checker(state: RAGState) -> dict:
    """Check if retrieved documents are sufficient."""
    docs = state["documents"]
    query = state["query"]

    if not docs:
        return {"needs_more_info": True, "iteration": state["iteration"] + 1}

    # Ask LLM to evaluate relevance
    doc_summaries = "\n".join([d["content"][:200] for d in docs[:5]])

    messages = [
        {"role": "user", "content": f"""Given the query and retrieved documents, can you answer the query?

Query: {query}

Documents:
{doc_summaries}

Reply with only YES or NO."""}
    ]
    response = llm.invoke(messages)

    needs_more = "NO" in response.content.upper()

    # Prevent infinite loops
    if state["iteration"] >= 2:
        needs_more = False

    return {"needs_more_info": needs_more, "iteration": state["iteration"] + 1}

Step 5: Query Rewriter

When retrieval fails, the query rewriter generates alternative queries.

def query_rewriter(state: RAGState) -> dict:
    """Rewrite the query for better retrieval."""
    query = state["query"]
    previous_queries = state["search_queries"]

    messages = [
        {"role": "user", "content": f"""The following search queries didn't find relevant results:
{previous_queries}

Original question: {query}

Generate 2 alternative search queries that might work better.
Use different keywords, synonyms, or rephrasings.
Return only the queries, one per line."""}
    ]
    response = llm.invoke(messages)

    new_queries = [q.strip() for q in response.content.split("\n") if q.strip()]
    return {"search_queries": new_queries}

Step 6: Generator

Finally, the generator synthesizes an answer from retrieved documents.

def generator(state: RAGState) -> dict:
    """Generate answer from retrieved documents."""
    docs = state["documents"]
    query = state["query"]

    if not docs:
        return {"answer": "I couldn't find relevant information to answer your question."}

    context = "\n\n".join([d["content"] for d in docs[:5]])

    messages = [
        {"role": "user", "content": f"""Answer the question based on the provided context.
If the context doesn't contain the answer, say so.

Context:
{context}

Question: {query}"""}
    ]
    response = llm.invoke(messages)

    return {"answer": response.content}

Step 7: Build the Graph

Now wire everything together with conditional edges:

def should_continue(state: RAGState) -> Literal["rewrite", "generate"]:
    """Route based on relevance check."""
    if state["needs_more_info"]:
        return "rewrite"
    return "generate"


def create_agentic_rag():
    """Build the Agentic RAG graph."""
    workflow = StateGraph(RAGState)

    # Add all nodes
    workflow.add_node("analyze", query_analyzer)
    workflow.add_node("retrieve", retriever)
    workflow.add_node("check", relevance_checker)
    workflow.add_node("rewrite", query_rewriter)
    workflow.add_node("generate", generator)

    # Define the flow
    workflow.set_entry_point("analyze")
    workflow.add_edge("analyze", "retrieve")
    workflow.add_edge("retrieve", "check")

    # Conditional branching
    workflow.add_conditional_edges(
        "check",
        should_continue,
        {"rewrite": "rewrite", "generate": "generate"}
    )

    # Loop back from rewrite to retrieve
    workflow.add_edge("rewrite", "retrieve")
    workflow.add_edge("generate", END)

    return workflow.compile()

Running the Pipeline

graph = create_agentic_rag()

result = graph.invoke({
    "query": "How does transformer attention work?",
    "search_queries": [],
    "documents": [],
    "answer": "",
    "needs_more_info": False,
    "iteration": 0
})

print(f"Iterations: {result['iteration']}")
print(f"Documents used: {len(result['documents'])}")
print(f"Answer: {result['answer']}")

Key Design Decisions

1. Iteration Limits

Always cap the number of retrieval loops to prevent runaway costs:

if state["iteration"] >= 2:
    needs_more = False  # Force generation after 2 attempts

2. Document Deduplication

Multiple search queries may return overlapping results. Deduplicate before generation to avoid redundancy.

3. Graceful Degradation

If retrieval completely fails, generate an honest “I don’t know” rather than hallucinating.

Comparison: Naive vs Agentic RAG

AspectNaive RAGAgentic RAG
Query handlingSingle embeddingMulti-query decomposition
Retrieval failuresSilent failureRetry with reformulation
Complex queriesPoor performanceHandles well
CostLowerHigher (more LLM calls)
LatencyFasterSlower (multiple rounds)

When to Use Agentic RAG

Use Agentic RAG when:

Stick with naive RAG when:

What’s Next

This basic Agentic RAG can be extended with:

Agentic RAG represents the evolution from simple retrieval to intelligent information synthesis. The LLM becomes an active participant in the search process, not just a passive consumer of retrieved documents.


Try It Yourself

Copy this prompt into your AI coding agent to build this project:

Build an Agentic RAG pipeline using LangGraph with these components:
1. Query analyzer that decomposes complex queries into sub-queries
2. Retriever that searches a ChromaDB vector store
3. Relevance checker that evaluates if retrieved docs can answer the query
4. Query rewriter that reformulates queries when retrieval fails
5. Generator that synthesizes the final answer

Use StateGraph with conditional edges for the retry loop. Include an iteration
limit to prevent infinite loops. Test with sample documents about a technical
topic and show how the agent handles queries that need multiple retrieval attempts.
● Intelligence at Every Action

AI Native
Project Management

Stop using tools that bolt on AI as an afterthought. Jovis is built AI-first — smart routing, proactive monitoring, and intelligent workflows from the ground up.