What is RAG (Retrieval-Augmented Generation)?
Let’s start with normal RAG.
Imagine you ask a question to ChatGPT — like:
“What are the symptoms of diabetes?”
ChatGPT doesn’t know the latest medical info, so it searches through a database or the internet, finds relevant text, and uses that text to answer your question.
That’s RAG — it means:
“Retrieve some information first, then use it to Generate an answer.”
So basically:
RAG = Search + Read + Answer
Step 2: What’s wrong with normal RAG?
Normal RAG is like a lazy student:
- You ask one big question.
- They look up something once.
- They write an answer — even if it’s incomplete or slightly wrong.
It doesn’t check its work, plan ahead, or look again if something’s missing.
Step 3: What is Agentic RAG?
Now imagine the student becomes smart and active — like a good researcher or detective.
This new student (Agentic RAG) doesn’t stop after one search.
They act intelligently — like an “agent” that can take actions.
Here’s how this new version behaves:
- Plans first:
“Hmm, the question has 3 parts — I’ll look up each separately.” - Searches step-by-step:
“Let me find info about cause, symptoms, and treatment one by one.” - Thinks critically:
“Wait, one source says X and another says Y — I should double-check.” - Takes actions:
If something’s missing, it searches again or uses a calculator, or checks another source. - Reviews its work:
“Have I covered all parts? Are my sources reliable?” - Then answers confidently.
So, Agentic RAG = Smart RAG that can plan, search again, check, and think before answering.
Step 4: Simple analogy
Let’s use a school example
| Situation | Student’s Behavior | Type |
|---|---|---|
| You ask: “Explain the causes of World War II.” | Student opens one textbook page, copies a paragraph, and reads it to you. | Normal RAG |
| You ask: “Explain the causes of World War II.” | Student breaks it down: “Political, Economic, Military causes.” They search for each one, compare different books, fix mistakes, and then explain clearly. | Agentic RAG |
So “Agentic” means:
The AI behaves like an independent student or researcher who plans and takes multiple steps, instead of just answering immediately.
Why do we need Agentic RAG?
Because:
- It gives more accurate answers
- It can handle complex or multi-part questions
- It can use tools (like search, math, or code)
- It can double-check its own answers
Basically, it’s much closer to how a human expert works.
One simple mental picture
Think of normal RAG as a “question–answer machine.”
Think of Agentic RAG as a “research assistant.”
The research assistant:
- Understands your goal
- Plans what to do
- Finds info step by step
- Thinks and checks
- Gives you a reliable final answer
Real-world example
Let’s say you ask:
“What are the health effects of air pollution in Delhi over the last five years?”
- Normal RAG: Searches once, gives a general answer like “Air pollution causes asthma, cough, etc.”
- Agentic RAG:
- Checks which years and reports to look at.
- Searches for Delhi government or WHO data.
- Finds health studies.
- Uses math to compare PM2.5 levels.
- Summarizes and cites all sources clearly.
The second answer is smarter, fresher, and more reliable.
In short:
| Term | Meaning |
|---|---|
| RAG | The AI searches for information and uses it to answer your question. |
| Agentic RAG | The AI acts like a human researcher: plans, searches multiple times, checks for mistakes, and gives a better final answer. |
Agentic RAG vs Normal RAG — World War(Python)
“””
Agentic RAG + Vector DB (Chroma) — Tiny Blog Snippet
Shows how to INSERT and FETCH from a vector database, then run Normal RAG vs Agentic RAG.
Deps:
pip install chromadb sentence-transformers
(Optional LLM) pip install openai # set OPENAI_API_KEY
Run:
python agentic_rag_chroma_tiny.py
“””
from future import annotations
from typing import List
from dataclasses import dataclass
import os, textwrap
Vector DB (Chroma) setup
import chromadb
from chromadb.utils import embedding_functions
EMB_MODEL = “all-MiniLM-L6-v2”
client = chromadb.Client()
embed_fn = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=EMB_MODEL)
col = client.create_collection(name=”kb”, embedding_function=embed_fn)
INSERT documents (ids + docs + metadata)
DOCS: List[str] = [
“The Treaty of Versailles imposed harsh reparations on Germany after WWI, fueling political resentment.”,
“The Great Depression created mass unemployment, empowering extremist parties across Europe.”,
“Appeasement let Germany annex the Sudetenland in 1938 without immediate consequences.”,
“Rearmament and militarization in the 1930s raised tensions among European powers.”,
“The Molotov–Ribbentrop Pact of 1939 allowed Germany and the USSR to split Poland.”,
“World War II began on September 1, 1939, when Germany invaded Poland; Britain and France declared war soon after.”,
]
IDS = [f”d{i}” for i in range(len(DOCS))]
METAS = [{“source”:”demo”,”topic”:”WWII”} for _ in DOCS]
col.add(documents=DOCS, ids=IDS, metadatas=METAS) # INSERT into vector DB
FETCH (semantic query) helper
def vdb_search(query: str, k: int = 3):
q = col.query(query_texts=[query], n_results=k)
# returns dict with ‘documents’, ‘ids’, ‘metadatas’, ‘distances’
docs = q.get(“documents”, [[]])[0]
ids = q.get(“ids”, [[]])[0]
dists = q.get(“distances”, [[]])[0]
return list(zip(ids, docs, dists))
Optional LLM wrapper (OpenAI if key; else stub)
class LLM:
def init(self, model=”gpt-4o-mini”):
self.model = model; self.use_openai = bool(os.getenv(“OPENAI_API_KEY”))
if self.use_openai:
try:
from openai import OpenAI
self.client = OpenAI()
except Exception:
self.use_openai = False
def chat(self, system: str, user: str) -> str:
if self.use_openai:
r = self.client.chat.completions.create(model=self.model, messages=[{“role”:”system”,”content”:system},{“role”:”user”,”content”:user}], temperature=0.2)
return r.choices[0].message.content
return textwrap.shorten(user, width=900, placeholder=”…”)
—- Normal RAG —-
class NormalRAG:
def init(self, L: LLM): self.L = L
def answer(self, q: str) -> str:
hits = vdb_search(q, k=4)
notes = “\n”.join(f”- {t}” for , t, in hits)
prompt = f”Use ONLY the notes below to answer.\nNotes:\n{notes}\n\nQ: {q}\nA:”
return self.L.chat(“You are a careful historian.”, prompt)
—- Agentic RAG —-
@dataclass
class Plan: goals: List[str]
class Planner:
def make_plan(self, q: str) -> Plan:
return Plan([“political causes”,”economic causes”,”military/strategic causes”,”immediate trigger”])
class AgenticRAG:
def init(self, L: LLM): self.L, self.planner = L, Planner()
def answer(self, q: str) -> str:
sections = []
for goal in self.planner.make_plan(q).goals:
hits = vdb_search(f”{goal} of World War II”, k=2)
notes = “\n”.join(f”- {t}” for , t, in hits)
bullets = self.L.chat(“Turn notes into 2 crisp bullets.”, f”Notes:\n{notes}\nBullets:”)
sections.append(f”### {goal.capitalize()}\n{bullets}”)
synth = (“Write a 120–160 word answer using only the sections below. Be clear and factual.\n\n” +
“\n\n”.join(sections) + f”\n\nQuestion: {q}\nAnswer:”)
return self.L.chat(“You write concise, grounded summaries.”, synth)
Demo: INSERT one more doc, then FETCH & answer
if name == “main“:
# Example of inserting a NEW value later
col.add(documents=[“Economic recovery efforts included rearmament to reduce unemployment.”], ids=[“extra1”], metadatas=[{“source”:”note”}])
q = "Explain the causes of World War II."
L = LLM()
print("=== FETCH top-3 for a sample query ===")
for i, (doc_id, doc, dist) in enumerate(vdb_search("economic causes of WWII", k=3), 1):
print(f"{i}. {doc_id} (distance={dist:.4f}) -> {doc[:80]}…")
print("\n=== Normal RAG ===\n", NormalRAG(L).answer(q))
print("\n=== Agentic RAG ===\n", AgenticRAG(L).answer(q))
