Week 6: GraphRAG
Learning Objectives
This week explores GraphRAG (Graph-based Retrieval-Augmented Generation), combining knowledge graphs with LLMs for enhanced question answering. You will learn hybrid retrieval strategies and sub-graph extraction techniques.
1. Introduction to GraphRAG
1. Introduction to GraphRAG
The "Library Search" Analogy: Vector vs. Graph
Imagine you are in a library looking for a specific answer.
- Vector Search (Traditional RAG) is like looking for books with similar covers. You might find a book about "Apples" (fruit) when you wanted "Apple Inc." because they look similar. It's fuzzy.
- GraphRAG is like checking the Citation Index. You find a book, and then follow the exact references to find related works. You follow the "Co-founder" link from Jobs to Wozniak. It's precise.
GraphRAG combines both: Vectors to find the entry point, Graph to navigate the context.
GraphRAG enhances traditional RAG by incorporating knowledge graph structure.
Traditional RAG vs GraphRAG
| Aspect | Traditional RAG | GraphRAG |
|---|---|---|
| Data Source | Document chunks | Knowledge graph + documents |
| Retrieval | Vector similarity | Graph traversal + semantic |
| Context | Isolated chunks | Connected knowledge |
| Reasoning | Limited to LLM | Explicit relationships |
Why GraphRAG?
- Multi-hop reasoning: Connect information across multiple entities
- Structured context: Provide relationship-aware context to LLMs
- Reduced hallucination: Grounded in explicit knowledge
- Explainability: Traceable reasoning paths
2. Architecture Overview
GraphRAG Pipeline
User Query
↓
┌─────────────────────────────────────┐
│ Hybrid Retrieval │
│ ┌─────────────┬─────────────┐ │
│ │ Vector │ Graph │ │
│ │ Search │ Traversal │ │
│ └─────────────┴─────────────┘ │
└─────────────────────────────────────┘
↓
Sub-graph Extraction
↓
Context Construction
↓
LLM Generation
↓
Answer3. Building the Knowledge Graph
Entity and Relationship Extraction
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
llm = ChatOpenAI(model="gpt-4")
extraction_prompt = PromptTemplate.from_template("""
Extract entities and relationships from the following text.
Return in format: (entity1, relationship, entity2)
Text: {text}
Entities and relationships:
""")
def extract_triplets(text):
response = llm.invoke(extraction_prompt.format(text=text))
triplets = parse_triplets(response.content)
return triplets
# Store in Neo4j
def store_triplets(triplets, graph):
for subj, pred, obj in triplets:
query = """
MERGE (s:Entity {name: $subj})
MERGE (o:Entity {name: $obj})
MERGE (s)-[r:RELATES {type: $pred}]->(o)
"""
graph.run(query, subj=subj, pred=pred, obj=obj)Adding Vector Embeddings
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
def add_embeddings(graph):
# Get all entities
entities = graph.run("MATCH (e:Entity) RETURN e.name as name").data()
for entity in entities:
embedding = model.encode(entity['name']).tolist()
graph.run("""
MATCH (e:Entity {name: $name})
SET e.embedding = $embedding
""", name=entity['name'], embedding=embedding)4. Hybrid Retrieval
Vector Search + Graph Traversal
import numpy as np
from neo4j import GraphDatabase
class HybridRetriever:
def __init__(self, graph, embedding_model):
self.graph = graph
self.model = embedding_model
def vector_search(self, query, top_k=5):
"""Find similar entities using vector similarity."""
query_embedding = self.model.encode(query).tolist()
result = self.graph.run("""
MATCH (e:Entity)
WHERE e.embedding IS NOT NULL
WITH e, gds.similarity.cosine(e.embedding, $embedding) AS score
ORDER BY score DESC
LIMIT $k
RETURN e.name as entity, score
""", embedding=query_embedding, k=top_k)
return result.data()
def graph_expand(self, entities, hops=2):
"""Expand from seed entities via graph traversal."""
entity_names = [e['entity'] for e in entities]
result = self.graph.run("""
MATCH (start:Entity)
WHERE start.name IN $entities
MATCH path = (start)-[*1..""" + str(hops) + """]->(related:Entity)
RETURN DISTINCT related.name as entity,
length(path) as distance,
[r in relationships(path) | r.type] as path_types
""", entities=entity_names)
return result.data()
def hybrid_retrieve(self, query, top_k=5, hops=2):
"""Combine vector search with graph expansion."""
# Step 1: Vector search for seed entities
seed_entities = self.vector_search(query, top_k)
# Step 2: Expand via graph traversal
expanded = self.graph_expand(seed_entities, hops)
# Step 3: Combine and rank
all_entities = seed_entities + expanded
return self._rank_entities(all_entities, query)5. Sub-graph Extraction
Extracting Relevant Sub-graphs
def extract_subgraph(graph, seed_entities, max_hops=2):
"""Extract a relevant sub-graph around seed entities."""
query = """
MATCH (start:Entity)
WHERE start.name IN $seeds
CALL {
WITH start
MATCH path = (start)-[*1..""" + str(max_hops) + """]-(connected)
RETURN path
}
WITH DISTINCT path
UNWIND relationships(path) as rel
WITH DISTINCT
startNode(rel).name as source,
type(rel) as relationship,
endNode(rel).name as target
RETURN source, relationship, target
"""
result = graph.run(query, seeds=seed_entities)
# Convert to networkx for analysis
import networkx as nx
G = nx.DiGraph()
for record in result:
G.add_edge(
record['source'],
record['target'],
relationship=record['relationship']
)
return G
def subgraph_to_text(G):
"""Convert sub-graph to natural language context."""
statements = []
for source, target, data in G.edges(data=True):
rel = data.get('relationship', 'related to')
statements.append(f"{source} {rel} {target}")
return ". ".join(statements)Path-based Context
def get_reasoning_paths(graph, entity1, entity2, max_length=4):
"""Find paths between two entities for reasoning."""
query = """
MATCH path = shortestPath(
(a:Entity {name: $e1})-[*1..""" + str(max_length) + """]-(b:Entity {name: $e2})
)
RETURN [n in nodes(path) | n.name] as nodes,
[r in relationships(path) | r.type] as relations
"""
result = graph.run(query, e1=entity1, e2=entity2)
paths = []
for record in result:
nodes = record['nodes']
relations = record['relations']
path_str = nodes[0]
for i, rel in enumerate(relations):
path_str += f" --[{rel}]--> {nodes[i+1]}"
paths.append(path_str)
return paths6. Context-Augmented Generation
Building the Prompt
from langchain.prompts import ChatPromptTemplate
graphrag_prompt = ChatPromptTemplate.from_messages([
("system", """You are a knowledgeable assistant. Answer questions using the
provided knowledge graph context. If the answer cannot be found in the context,
say so clearly.
Knowledge Graph Context:
{kg_context}
Relevant Paths:
{paths}"""),
("human", "{question}")
])
def generate_answer(query, retriever, llm):
# Retrieve relevant context
entities = retriever.hybrid_retrieve(query)
entity_names = [e['entity'] for e in entities[:10]]
# Extract sub-graph
subgraph = extract_subgraph(retriever.graph, entity_names)
kg_context = subgraph_to_text(subgraph)
# Get reasoning paths if query mentions entities
paths = []
detected_entities = extract_entities_from_query(query)
if len(detected_entities) >= 2:
paths = get_reasoning_paths(
retriever.graph,
detected_entities[0],
detected_entities[1]
)
# Generate answer
prompt = graphrag_prompt.format(
kg_context=kg_context,
paths="\n".join(paths) if paths else "No specific paths found",
question=query
)
response = llm.invoke(prompt)
return response.contentComplete GraphRAG System
class GraphRAGSystem:
def __init__(self, neo4j_uri, neo4j_auth, openai_key):
from neo4j import GraphDatabase
from langchain_openai import ChatOpenAI
from sentence_transformers import SentenceTransformer
self.graph = GraphDatabase.driver(neo4j_uri, auth=neo4j_auth)
self.llm = ChatOpenAI(model="gpt-4", api_key=openai_key)
self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
self.retriever = HybridRetriever(self.graph, self.embedding_model)
def query(self, question):
return generate_answer(question, self.retriever, self.llm)
def add_document(self, text):
# Extract triplets
triplets = extract_triplets(text)
# Store in graph
with self.graph.session() as session:
store_triplets(triplets, session)
# Add embeddings
add_embeddings(self.graph)Project: Movie Recommendation Knowledge Graph
Progress
| Week | Topic | Project Milestone |
|---|---|---|
| 1 | Ontology Introduction | ✅ Movie domain design completed |
| 2 | RDF & RDFS | ✅ 10 movies converted to RDF |
| 3 | OWL & Reasoning | ✅ Inference rules applied |
| 4 | Knowledge Extraction | ✅ 100 movies auto-collected |
| 5 | Neo4j | ✅ Graph DB constructed |
| 6 | GraphRAG | Natural language query system |
| 7 | Ontology Agents | New movie auto-update |
| 8 | Domain Expansion | Medical/Legal/Finance cases |
| 9 | Service Deployment | API + Dashboard |
Week 6 Milestone: GraphRAG for Natural Language Movie Recommendations
Build a system that can answer natural language questions like "Recommend me a sci-fi movie in Nolan's style."
GraphRAG Architecture:
User Question: "Recommend me a sci-fi movie in Nolan's style"
↓ LLM (Question Analysis)
Intent: Recommendation, Conditions: Director=Nolan style, Genre=Sci-Fi
↓ Cypher Generation
MATCH (d:Person {name: "Christopher Nolan"})-[:DIRECTED]->(m:Movie)
-[:HAS_GENRE]->(:Genre {name: "Sci-Fi"})
RETURN m.title
↓ Neo4j Execution
["Inception", "Interstellar", "Tenet"]
↓ LLM (Response Generation)
"Nolan's sci-fi films include Inception, Interstellar, and Tenet."Hybrid Search:
- Vector Index: Movie plot embeddings
- Graph Index: Relationship-based traversal
- Combination: Semantic similarity + Structural connections
In the project notebook, you'll build a chatbot that recommends movies via natural language.
In the project notebook, you will implement:
- LangChain + Neo4j integration setup
- Natural language → Cypher query auto-conversion
- Handle "Recommend Nolan-style sci-fi movies" queries
- Vector + Graph hybrid search
What you'll build by Week 9: An AI agent that answers "Recommend sci-fi movies like Nolan's style" by reasoning over director-genre-rating relationships in the knowledge graph
Practice Notebook
For deeper exploration of the theory:
The practice notebook covers additional topics:
- Microsoft GraphRAG vs LangChain comparison
- Community detection and summarization
- Custom Retriever implementation
- Conversation context management
Interview Questions
What are the advantages of GraphRAG over traditional RAG?
Key Advantages:
- Multi-hop reasoning: Can connect facts across multiple documents
- Relationship awareness: Understands how entities relate
- Reduced hallucination: Grounded in explicit knowledge structure
- Better for complex queries: Handles "who", "what", "why" chains
- Explainability: Can show reasoning paths in the graph
- Structured knowledge: Complements unstructured document retrieval
Premium Content
Want complete solutions with detailed explanations and production-ready code?
Check out the Ontology & Knowledge Graph Cookbook Premium (opens in a new tab) for:
- Complete notebook solutions with step-by-step explanations
- Real-world case studies and best practices
- Interview preparation materials
- Production deployment guides
Next Steps
In Week 7: Ontology-based Agents, you will learn how to build AI agents that leverage ontologies for planning and reasoning.