Modern AI applications are no longer powered by simple keyword matching. Production-grade Retrieval-Augmented Generation (RAG) systems use a combination of:

Semantic Vector Search
+
Keyword-Based Search

This approach is called:

Hybrid Search

Hybrid Search dramatically improves retrieval quality in AI applications like:

AI Chatbots
Enterprise Knowledge Search
Healthcare AI Systems
Resume Search Engines
Legal Document Assistants
Customer Support AI
Internal Company Search

In this article, we will build a production-style Hybrid Search system using:

FastAPI
Qdrant
Sentence Transformers
BM25
Python

Why Pure Vector Search Is Not Enough

Vector databases like Qdrant are excellent at understanding semantic meaning.

Example:

Query:

What is Pankaj's contact information?

Vector search can correctly retrieve:

Mobile: (+91) 1234567890

because it understands contextual meaning.

However, vector search struggles with:

Email addresses
Phone numbers
Invoice IDs
Employee IDs
Exact names
Codes
Technical keywords

Example:

What is Pankaj email id?

Pure semantic search may retrieve:

Professional summary

instead of:

Pankajthapa4@gmail.com

This is where Hybrid Search becomes essential.

What Is Hybrid Search?

Hybrid Search combines:

Search Type	Purpose
Semantic Vector Search	Understand meaning/context
BM25 Keyword Search	Exact keyword relevance

Production AI systems combine both scores to improve retrieval quality.

Architecture:

User Query
   ↓
Generate Embedding
   ↓
Vector Search (Qdrant)
   ↓
BM25 Keyword Ranking
   ↓
Combine Scores
   ↓
Return Best Chunks
   ↓
Send Context to LLM

This architecture is commonly used in:

Perplexity AI
Azure AI Search
Elasticsearch AI Search
Pinecone Hybrid Search
Weaviate Hybrid Search

Step 1 — Install Dependencies

Install required libraries:

pip install qdrant-client sentence-transformers rank-bm25 fastapi uvicorn

Step 2 — Initialize Embedding Model

We will use:

all-MiniLM-L6-v2

This model converts text into vector embeddings.

from sentence_transformers import SentenceTransformer

embedding_model = SentenceTransformer(
    "all-MiniLM-L6-v2"
)

Step 3 — Initialize Qdrant

from qdrant_client import QdrantClient

qdrant_client = QdrantClient(
    host="localhost",
    port=6333
)

COLLECTION_NAME = "ai_documents"

Step 4 — Create BM25 Search Function

Install BM25 library:

pip install rank-bm25

Now create BM25 scoring logic:

from rank_bm25 import BM25Okapi


def calculate_bm25_score(
    query: str,
    documents: list[str]
):

    tokenized_docs = [
        doc.lower().split()
        for doc in documents
    ]

    bm25 = BM25Okapi(tokenized_docs)

    tokenized_query = query.lower().split()

    scores = bm25.get_scores(
        tokenized_query
    )

    return scores

BM25 ranks documents based on keyword relevance.

Step 5 — Implement Hybrid Search

Now let’s combine:

Vector Score
+
BM25 Score

def search_qdrant(
    query: str,
    top_k: int = 3
):

    query_embedding = embedding_model.encode(query)

    results = qdrant_client.query_points(
        collection_name=COLLECTION_NAME,
        query=query_embedding.tolist(),
        limit=top_k
    )

    texts = [
        point.payload.get("text", "")
        for point in results.points
    ]

    bm25_scores = calculate_bm25_score(
        query=query,
        documents=texts
    )

    documents = []

    for index, point in enumerate(results.points):

        text = point.payload.get("text", "")

        vector_score = point.score

        keyword_score = bm25_scores[index]

        hybrid_score = (
            vector_score * 0.7
        ) + (
            keyword_score * 0.3
        )

        documents.append({
            "text": text,
            "vector_score": round(vector_score, 4),
            "bm25_score": round(float(keyword_score), 4),
            "hybrid_score": round(float(hybrid_score), 4)
        })

    documents = sorted(
        documents,
        key=lambda x: x["hybrid_score"],
        reverse=True
    )

    return documents

How Hybrid Ranking Works

Suppose the query is:

What is pankaj phone number?

Retrieved chunks:

Chunk	Vector Score	BM25 Score
Resume Summary	0.81	0.4
Phone Number Chunk	0.76	4.8

Hybrid score formula:

hybrid_score = (
    vector_score * 0.7
) + (
    bm25_score * 0.3
)

Final ranking:

Chunk	Hybrid Score
Phone Number Chunk	1.96 ✅
Resume Summary	0.69

Now the correct chunk ranks highest.

Step 6 — Create FastAPI Endpoint

from fastapi import FastAPI

app = FastAPI()


@app.get("/search-docs")
async def search_docs(
    query: str
):

    results = search_qdrant(query)

    return {
        "query": query,
        "results": results
    }

Run server:

uvicorn app.main:app --reload

Step 7 — Test Hybrid Search

Test:

http://127.0.0.1:8000/search-docs?query=What is Himanshu phone number

Expected result:

{
  "text": "Mobile: (+91) 7579414837",
  "vector_score": 0.3015,
  "bm25_score": 4.921,
  "hybrid_score": 1.687
}

Why Hybrid Search Matters in Production

Pure semantic search is insufficient for enterprise systems.

Hybrid Search improves:

Exact information retrieval
Email search
Contact search
Invoice lookup
Healthcare IDs
Employee records
Legal document search

This becomes critical in:

Healthcare Systems
Banking Systems
Enterprise Knowledge Platforms
SaaS AI Applications

Production-Grade Hybrid Search Architecture

Modern enterprise RAG systems typically use:

FastAPI
Qdrant
BM25
Redis Cache
OpenAI/Groq
Rerankers

Advanced systems further add:

Cross-encoder reranking
Metadata filtering
Query rewriting
Multi-stage retrieval
Access control
Tenant isolation

Final Thoughts

Hybrid Search is one of the most important upgrades you can make to a RAG system.

By combining:

Semantic Understanding
+
Keyword Precision

you create retrieval systems that are:

More accurate
More production-ready
More scalable
More enterprise-friendly

If you are building modern AI applications, Hybrid Search should be considered a foundational architecture pattern rather than an optional enhancement.

Building Production-Ready Hybrid Search in RAG: Combining Semantic Search with BM25 Keyword Search