Week 6: GraphRAG

Learning Objectives

This week explores GraphRAG (Graph-based Retrieval-Augmented Generation), combining knowledge graphs with LLMs for enhanced question answering. You will learn hybrid retrieval strategies and sub-graph extraction techniques.

1. Introduction to GraphRAG

The "Library Search" Analogy: Vector vs. Graph

Imagine you are in a library looking for a specific answer.

Vector Search (Traditional RAG) is like looking for books with similar covers. You might find a book about "Apples" (fruit) when you wanted "Apple Inc." because they look similar. It's fuzzy.
GraphRAG is like checking the Citation Index. You find a book, and then follow the exact references to find related works. You follow the "Co-founder" link from Jobs to Wozniak. It's precise.

GraphRAG combines both: Vectors to find the entry point, Graph to navigate the context.

GraphRAG enhances traditional RAG by incorporating knowledge graph structure.

Traditional RAG vs GraphRAG

Aspect	Traditional RAG	GraphRAG
Data Source	Document chunks	Knowledge graph + documents
Retrieval	Vector similarity	Graph traversal + semantic
Context	Isolated chunks	Connected knowledge
Reasoning	Limited to LLM	Explicit relationships

Why GraphRAG?

Multi-hop reasoning: Connect information across multiple entities
Structured context: Provide relationship-aware context to LLMs
Reduced hallucination: Grounded in explicit knowledge
Explainability: Traceable reasoning paths

2. Architecture Overview

GraphRAG Pipeline

User Query
    ↓
┌─────────────────────────────────────┐
│        Hybrid Retrieval              │
│  ┌─────────────┬─────────────┐      │
│  │ Vector      │ Graph       │      │
│  │ Search      │ Traversal   │      │
│  └─────────────┴─────────────┘      │
└─────────────────────────────────────┘
    ↓
Sub-graph Extraction
    ↓
Context Construction
    ↓
LLM Generation
    ↓
Answer

3. Building the Knowledge Graph

Entity and Relationship Extraction

from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
 
llm = ChatOpenAI(model="gpt-4")
 
extraction_prompt = PromptTemplate.from_template("""
Extract entities and relationships from the following text.
Return in format: (entity1, relationship, entity2)
 
Text: {text}
 
Entities and relationships:
""")
 
def extract_triplets(text):
    response = llm.invoke(extraction_prompt.format(text=text))
    triplets = parse_triplets(response.content)
    return triplets
 
# Store in Neo4j
def store_triplets(triplets, graph):
    for subj, pred, obj in triplets:
        query = """
        MERGE (s:Entity {name: $subj})
        MERGE (o:Entity {name: $obj})
        MERGE (s)-[r:RELATES {type: $pred}]->(o)
        """
        graph.run(query, subj=subj, pred=pred, obj=obj)

Adding Vector Embeddings

from sentence_transformers import SentenceTransformer
 
model = SentenceTransformer('all-MiniLM-L6-v2')
 
def add_embeddings(graph):
    # Get all entities
    entities = graph.run("MATCH (e:Entity) RETURN e.name as name").data()
 
    for entity in entities:
        embedding = model.encode(entity['name']).tolist()
        graph.run("""
            MATCH (e:Entity {name: $name})
            SET e.embedding = $embedding
        """, name=entity['name'], embedding=embedding)

4. Hybrid Retrieval

Vector Search + Graph Traversal

import numpy as np
from neo4j import GraphDatabase
 
class HybridRetriever:
    def __init__(self, graph, embedding_model):
        self.graph = graph
        self.model = embedding_model
 
    def vector_search(self, query, top_k=5):
        """Find similar entities using vector similarity."""
        query_embedding = self.model.encode(query).tolist()
 
        result = self.graph.run("""
            MATCH (e:Entity)
            WHERE e.embedding IS NOT NULL
            WITH e, gds.similarity.cosine(e.embedding, $embedding) AS score
            ORDER BY score DESC
            LIMIT $k
            RETURN e.name as entity, score
        """, embedding=query_embedding, k=top_k)
 
        return result.data()
 
    def graph_expand(self, entities, hops=2):
        """Expand from seed entities via graph traversal."""
        entity_names = [e['entity'] for e in entities]
 
        result = self.graph.run("""
            MATCH (start:Entity)
            WHERE start.name IN $entities
            MATCH path = (start)-[*1..""" + str(hops) + """]->(related:Entity)
            RETURN DISTINCT related.name as entity,
                   length(path) as distance,
                   [r in relationships(path) | r.type] as path_types
        """, entities=entity_names)
 
        return result.data()
 
    def hybrid_retrieve(self, query, top_k=5, hops=2):
        """Combine vector search with graph expansion."""
        # Step 1: Vector search for seed entities
        seed_entities = self.vector_search(query, top_k)
 
        # Step 2: Expand via graph traversal
        expanded = self.graph_expand(seed_entities, hops)
 
        # Step 3: Combine and rank
        all_entities = seed_entities + expanded
        return self._rank_entities(all_entities, query)

5. Sub-graph Extraction

Extracting Relevant Sub-graphs

def extract_subgraph(graph, seed_entities, max_hops=2):
    """Extract a relevant sub-graph around seed entities."""
 
    query = """
    MATCH (start:Entity)
    WHERE start.name IN $seeds
    CALL {
        WITH start
        MATCH path = (start)-[*1..""" + str(max_hops) + """]-(connected)
        RETURN path
    }
    WITH DISTINCT path
    UNWIND relationships(path) as rel
    WITH DISTINCT
        startNode(rel).name as source,
        type(rel) as relationship,
        endNode(rel).name as target
    RETURN source, relationship, target
    """
 
    result = graph.run(query, seeds=seed_entities)
 
    # Convert to networkx for analysis
    import networkx as nx
    G = nx.DiGraph()
 
    for record in result:
        G.add_edge(
            record['source'],
            record['target'],
            relationship=record['relationship']
        )
 
    return G
 
def subgraph_to_text(G):
    """Convert sub-graph to natural language context."""
    statements = []
    for source, target, data in G.edges(data=True):
        rel = data.get('relationship', 'related to')
        statements.append(f"{source} {rel} {target}")
 
    return ". ".join(statements)

Path-based Context

def get_reasoning_paths(graph, entity1, entity2, max_length=4):
    """Find paths between two entities for reasoning."""
 
    query = """
    MATCH path = shortestPath(
        (a:Entity {name: $e1})-[*1..""" + str(max_length) + """]-(b:Entity {name: $e2})
    )
    RETURN [n in nodes(path) | n.name] as nodes,
           [r in relationships(path) | r.type] as relations
    """
 
    result = graph.run(query, e1=entity1, e2=entity2)
    paths = []
 
    for record in result:
        nodes = record['nodes']
        relations = record['relations']
        path_str = nodes[0]
        for i, rel in enumerate(relations):
            path_str += f" --[{rel}]--> {nodes[i+1]}"
        paths.append(path_str)
 
    return paths

6. Context-Augmented Generation

Building the Prompt

from langchain.prompts import ChatPromptTemplate
 
graphrag_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a knowledgeable assistant. Answer questions using the
provided knowledge graph context. If the answer cannot be found in the context,
say so clearly.
 
Knowledge Graph Context:
{kg_context}
 
Relevant Paths:
{paths}"""),
    ("human", "{question}")
])
 
def generate_answer(query, retriever, llm):
    # Retrieve relevant context
    entities = retriever.hybrid_retrieve(query)
    entity_names = [e['entity'] for e in entities[:10]]
 
    # Extract sub-graph
    subgraph = extract_subgraph(retriever.graph, entity_names)
    kg_context = subgraph_to_text(subgraph)
 
    # Get reasoning paths if query mentions entities
    paths = []
    detected_entities = extract_entities_from_query(query)
    if len(detected_entities) >= 2:
        paths = get_reasoning_paths(
            retriever.graph,
            detected_entities[0],
            detected_entities[1]
        )
 
    # Generate answer
    prompt = graphrag_prompt.format(
        kg_context=kg_context,
        paths="\n".join(paths) if paths else "No specific paths found",
        question=query
    )
 
    response = llm.invoke(prompt)
    return response.content

Complete GraphRAG System

class GraphRAGSystem:
    def __init__(self, neo4j_uri, neo4j_auth, openai_key):
        from neo4j import GraphDatabase
        from langchain_openai import ChatOpenAI
        from sentence_transformers import SentenceTransformer
 
        self.graph = GraphDatabase.driver(neo4j_uri, auth=neo4j_auth)
        self.llm = ChatOpenAI(model="gpt-4", api_key=openai_key)
        self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
        self.retriever = HybridRetriever(self.graph, self.embedding_model)
 
    def query(self, question):
        return generate_answer(question, self.retriever, self.llm)
 
    def add_document(self, text):
        # Extract triplets
        triplets = extract_triplets(text)
 
        # Store in graph
        with self.graph.session() as session:
            store_triplets(triplets, session)
 
        # Add embeddings
        add_embeddings(self.graph)

Project: Movie Recommendation Knowledge Graph

Progress

Week	Topic	Project Milestone
1	Ontology Introduction	✅ Movie domain design completed
2	RDF & RDFS	✅ 10 movies converted to RDF
3	OWL & Reasoning	✅ Inference rules applied
4	Knowledge Extraction	✅ 100 movies auto-collected
5	Neo4j	✅ Graph DB constructed
6	GraphRAG	Natural language query system
7	Ontology Agents	New movie auto-update
8	Domain Expansion	Medical/Legal/Finance cases
9	Service Deployment	API + Dashboard

Week 6 Milestone: GraphRAG for Natural Language Movie Recommendations

Build a system that can answer natural language questions like "Recommend me a sci-fi movie in Nolan's style."

GraphRAG Architecture:

User Question: "Recommend me a sci-fi movie in Nolan's style"
    ↓ LLM (Question Analysis)
Intent: Recommendation, Conditions: Director=Nolan style, Genre=Sci-Fi
    ↓ Cypher Generation
MATCH (d:Person {name: "Christopher Nolan"})-[:DIRECTED]->(m:Movie)
      -[:HAS_GENRE]->(:Genre {name: "Sci-Fi"})
RETURN m.title
    ↓ Neo4j Execution
["Inception", "Interstellar", "Tenet"]
    ↓ LLM (Response Generation)
"Nolan's sci-fi films include Inception, Interstellar, and Tenet."

Hybrid Search:

Vector Index: Movie plot embeddings
Graph Index: Relationship-based traversal
Combination: Semantic similarity + Structural connections

In the project notebook, you'll build a chatbot that recommends movies via natural language.

In the project notebook, you will implement:

LangChain + Neo4j integration setup
Natural language → Cypher query auto-conversion
Handle "Recommend Nolan-style sci-fi movies" queries
Vector + Graph hybrid search

What you'll build by Week 9: An AI agent that answers "Recommend sci-fi movies like Nolan's style" by reasoning over director-genre-rating relationships in the knowledge graph

Practice Notebook

For deeper exploration of the theory:

The practice notebook covers additional topics:

Microsoft GraphRAG vs LangChain comparison
Community detection and summarization
Custom Retriever implementation
Conversation context management

Interview Questions

What are the advantages of GraphRAG over traditional RAG?

Key Advantages:

Multi-hop reasoning: Can connect facts across multiple documents
Relationship awareness: Understands how entities relate
Reduced hallucination: Grounded in explicit knowledge structure
Better for complex queries: Handles "who", "what", "why" chains
Explainability: Can show reasoning paths in the graph
Structured knowledge: Complements unstructured document retrieval

Premium Content

Want complete solutions with detailed explanations and production-ready code?

Check out the Ontology & Knowledge Graph Cookbook Premium (opens in a new tab) for:

Complete notebook solutions with step-by-step explanations
Real-world case studies and best practices
Interview preparation materials
Production deployment guides

Next Steps

In Week 7: Ontology-based Agents, you will learn how to build AI agents that leverage ontologies for planning and reasoning.

Week 5: Neo4j Week 7: Ontology Agents