en
Tutorials
Week 6: GraphRAG

Week 6: GraphRAG

Learning Objectives

This week explores GraphRAG (Graph-based Retrieval-Augmented Generation), combining knowledge graphs with LLMs for enhanced question answering. You will learn hybrid retrieval strategies and sub-graph extraction techniques.


1. Introduction to GraphRAG

1. Introduction to GraphRAG

The "Library Search" Analogy: Vector vs. Graph

Imagine you are in a library looking for a specific answer.

  • Vector Search (Traditional RAG) is like looking for books with similar covers. You might find a book about "Apples" (fruit) when you wanted "Apple Inc." because they look similar. It's fuzzy.
  • GraphRAG is like checking the Citation Index. You find a book, and then follow the exact references to find related works. You follow the "Co-founder" link from Jobs to Wozniak. It's precise.

GraphRAG combines both: Vectors to find the entry point, Graph to navigate the context.

GraphRAG enhances traditional RAG by incorporating knowledge graph structure.

Traditional RAG vs GraphRAG

AspectTraditional RAGGraphRAG
Data SourceDocument chunksKnowledge graph + documents
RetrievalVector similarityGraph traversal + semantic
ContextIsolated chunksConnected knowledge
ReasoningLimited to LLMExplicit relationships

Why GraphRAG?

  • Multi-hop reasoning: Connect information across multiple entities
  • Structured context: Provide relationship-aware context to LLMs
  • Reduced hallucination: Grounded in explicit knowledge
  • Explainability: Traceable reasoning paths

2. Architecture Overview

GraphRAG Pipeline

User Query

┌─────────────────────────────────────┐
│        Hybrid Retrieval              │
│  ┌─────────────┬─────────────┐      │
│  │ Vector      │ Graph       │      │
│  │ Search      │ Traversal   │      │
│  └─────────────┴─────────────┘      │
└─────────────────────────────────────┘

Sub-graph Extraction

Context Construction

LLM Generation

Answer

3. Building the Knowledge Graph

Entity and Relationship Extraction

from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
 
llm = ChatOpenAI(model="gpt-4")
 
extraction_prompt = PromptTemplate.from_template("""
Extract entities and relationships from the following text.
Return in format: (entity1, relationship, entity2)
 
Text: {text}
 
Entities and relationships:
""")
 
def extract_triplets(text):
    response = llm.invoke(extraction_prompt.format(text=text))
    triplets = parse_triplets(response.content)
    return triplets
 
# Store in Neo4j
def store_triplets(triplets, graph):
    for subj, pred, obj in triplets:
        query = """
        MERGE (s:Entity {name: $subj})
        MERGE (o:Entity {name: $obj})
        MERGE (s)-[r:RELATES {type: $pred}]->(o)
        """
        graph.run(query, subj=subj, pred=pred, obj=obj)

Adding Vector Embeddings

from sentence_transformers import SentenceTransformer
 
model = SentenceTransformer('all-MiniLM-L6-v2')
 
def add_embeddings(graph):
    # Get all entities
    entities = graph.run("MATCH (e:Entity) RETURN e.name as name").data()
 
    for entity in entities:
        embedding = model.encode(entity['name']).tolist()
        graph.run("""
            MATCH (e:Entity {name: $name})
            SET e.embedding = $embedding
        """, name=entity['name'], embedding=embedding)

4. Hybrid Retrieval

Vector Search + Graph Traversal

import numpy as np
from neo4j import GraphDatabase
 
class HybridRetriever:
    def __init__(self, graph, embedding_model):
        self.graph = graph
        self.model = embedding_model
 
    def vector_search(self, query, top_k=5):
        """Find similar entities using vector similarity."""
        query_embedding = self.model.encode(query).tolist()
 
        result = self.graph.run("""
            MATCH (e:Entity)
            WHERE e.embedding IS NOT NULL
            WITH e, gds.similarity.cosine(e.embedding, $embedding) AS score
            ORDER BY score DESC
            LIMIT $k
            RETURN e.name as entity, score
        """, embedding=query_embedding, k=top_k)
 
        return result.data()
 
    def graph_expand(self, entities, hops=2):
        """Expand from seed entities via graph traversal."""
        entity_names = [e['entity'] for e in entities]
 
        result = self.graph.run("""
            MATCH (start:Entity)
            WHERE start.name IN $entities
            MATCH path = (start)-[*1..""" + str(hops) + """]->(related:Entity)
            RETURN DISTINCT related.name as entity,
                   length(path) as distance,
                   [r in relationships(path) | r.type] as path_types
        """, entities=entity_names)
 
        return result.data()
 
    def hybrid_retrieve(self, query, top_k=5, hops=2):
        """Combine vector search with graph expansion."""
        # Step 1: Vector search for seed entities
        seed_entities = self.vector_search(query, top_k)
 
        # Step 2: Expand via graph traversal
        expanded = self.graph_expand(seed_entities, hops)
 
        # Step 3: Combine and rank
        all_entities = seed_entities + expanded
        return self._rank_entities(all_entities, query)

5. Sub-graph Extraction

Extracting Relevant Sub-graphs

def extract_subgraph(graph, seed_entities, max_hops=2):
    """Extract a relevant sub-graph around seed entities."""
 
    query = """
    MATCH (start:Entity)
    WHERE start.name IN $seeds
    CALL {
        WITH start
        MATCH path = (start)-[*1..""" + str(max_hops) + """]-(connected)
        RETURN path
    }
    WITH DISTINCT path
    UNWIND relationships(path) as rel
    WITH DISTINCT
        startNode(rel).name as source,
        type(rel) as relationship,
        endNode(rel).name as target
    RETURN source, relationship, target
    """
 
    result = graph.run(query, seeds=seed_entities)
 
    # Convert to networkx for analysis
    import networkx as nx
    G = nx.DiGraph()
 
    for record in result:
        G.add_edge(
            record['source'],
            record['target'],
            relationship=record['relationship']
        )
 
    return G
 
def subgraph_to_text(G):
    """Convert sub-graph to natural language context."""
    statements = []
    for source, target, data in G.edges(data=True):
        rel = data.get('relationship', 'related to')
        statements.append(f"{source} {rel} {target}")
 
    return ". ".join(statements)

Path-based Context

def get_reasoning_paths(graph, entity1, entity2, max_length=4):
    """Find paths between two entities for reasoning."""
 
    query = """
    MATCH path = shortestPath(
        (a:Entity {name: $e1})-[*1..""" + str(max_length) + """]-(b:Entity {name: $e2})
    )
    RETURN [n in nodes(path) | n.name] as nodes,
           [r in relationships(path) | r.type] as relations
    """
 
    result = graph.run(query, e1=entity1, e2=entity2)
    paths = []
 
    for record in result:
        nodes = record['nodes']
        relations = record['relations']
        path_str = nodes[0]
        for i, rel in enumerate(relations):
            path_str += f" --[{rel}]--> {nodes[i+1]}"
        paths.append(path_str)
 
    return paths

6. Context-Augmented Generation

Building the Prompt

from langchain.prompts import ChatPromptTemplate
 
graphrag_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a knowledgeable assistant. Answer questions using the
provided knowledge graph context. If the answer cannot be found in the context,
say so clearly.
 
Knowledge Graph Context:
{kg_context}
 
Relevant Paths:
{paths}"""),
    ("human", "{question}")
])
 
def generate_answer(query, retriever, llm):
    # Retrieve relevant context
    entities = retriever.hybrid_retrieve(query)
    entity_names = [e['entity'] for e in entities[:10]]
 
    # Extract sub-graph
    subgraph = extract_subgraph(retriever.graph, entity_names)
    kg_context = subgraph_to_text(subgraph)
 
    # Get reasoning paths if query mentions entities
    paths = []
    detected_entities = extract_entities_from_query(query)
    if len(detected_entities) >= 2:
        paths = get_reasoning_paths(
            retriever.graph,
            detected_entities[0],
            detected_entities[1]
        )
 
    # Generate answer
    prompt = graphrag_prompt.format(
        kg_context=kg_context,
        paths="\n".join(paths) if paths else "No specific paths found",
        question=query
    )
 
    response = llm.invoke(prompt)
    return response.content

Complete GraphRAG System

class GraphRAGSystem:
    def __init__(self, neo4j_uri, neo4j_auth, openai_key):
        from neo4j import GraphDatabase
        from langchain_openai import ChatOpenAI
        from sentence_transformers import SentenceTransformer
 
        self.graph = GraphDatabase.driver(neo4j_uri, auth=neo4j_auth)
        self.llm = ChatOpenAI(model="gpt-4", api_key=openai_key)
        self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
        self.retriever = HybridRetriever(self.graph, self.embedding_model)
 
    def query(self, question):
        return generate_answer(question, self.retriever, self.llm)
 
    def add_document(self, text):
        # Extract triplets
        triplets = extract_triplets(text)
 
        # Store in graph
        with self.graph.session() as session:
            store_triplets(triplets, session)
 
        # Add embeddings
        add_embeddings(self.graph)

Project: Movie Recommendation Knowledge Graph

Progress

WeekTopicProject Milestone
1Ontology Introduction✅ Movie domain design completed
2RDF & RDFS✅ 10 movies converted to RDF
3OWL & Reasoning✅ Inference rules applied
4Knowledge Extraction✅ 100 movies auto-collected
5Neo4j✅ Graph DB constructed
6GraphRAGNatural language query system
7Ontology AgentsNew movie auto-update
8Domain ExpansionMedical/Legal/Finance cases
9Service DeploymentAPI + Dashboard

Week 6 Milestone: GraphRAG for Natural Language Movie Recommendations

Build a system that can answer natural language questions like "Recommend me a sci-fi movie in Nolan's style."

GraphRAG Architecture:

User Question: "Recommend me a sci-fi movie in Nolan's style"
    ↓ LLM (Question Analysis)
Intent: Recommendation, Conditions: Director=Nolan style, Genre=Sci-Fi
    ↓ Cypher Generation
MATCH (d:Person {name: "Christopher Nolan"})-[:DIRECTED]->(m:Movie)
      -[:HAS_GENRE]->(:Genre {name: "Sci-Fi"})
RETURN m.title
    ↓ Neo4j Execution
["Inception", "Interstellar", "Tenet"]
    ↓ LLM (Response Generation)
"Nolan's sci-fi films include Inception, Interstellar, and Tenet."

Hybrid Search:

  • Vector Index: Movie plot embeddings
  • Graph Index: Relationship-based traversal
  • Combination: Semantic similarity + Structural connections

In the project notebook, you'll build a chatbot that recommends movies via natural language.

In the project notebook, you will implement:

  • LangChain + Neo4j integration setup
  • Natural language → Cypher query auto-conversion
  • Handle "Recommend Nolan-style sci-fi movies" queries
  • Vector + Graph hybrid search

What you'll build by Week 9: An AI agent that answers "Recommend sci-fi movies like Nolan's style" by reasoning over director-genre-rating relationships in the knowledge graph


Practice Notebook

For deeper exploration of the theory:

The practice notebook covers additional topics:

  • Microsoft GraphRAG vs LangChain comparison
  • Community detection and summarization
  • Custom Retriever implementation
  • Conversation context management

Interview Questions

What are the advantages of GraphRAG over traditional RAG?

Key Advantages:

  • Multi-hop reasoning: Can connect facts across multiple documents
  • Relationship awareness: Understands how entities relate
  • Reduced hallucination: Grounded in explicit knowledge structure
  • Better for complex queries: Handles "who", "what", "why" chains
  • Explainability: Can show reasoning paths in the graph
  • Structured knowledge: Complements unstructured document retrieval

Premium Content

Want complete solutions with detailed explanations and production-ready code?

Check out the Ontology & Knowledge Graph Cookbook Premium (opens in a new tab) for:

  • Complete notebook solutions with step-by-step explanations
  • Real-world case studies and best practices
  • Interview preparation materials
  • Production deployment guides

Next Steps

In Week 7: Ontology-based Agents, you will learn how to build AI agents that leverage ontologies for planning and reasoning.