Week 5: Neo4j
Learning Objectives
This week introduces Neo4j, the leading graph database. You will learn about the Labeled Property Graph model, write Cypher queries, and integrate Neo4j with Python applications.
1. Introduction to Neo4j
1. Introduction to Neo4j
Theory vs. Practice: "Esperanto" vs. "Native Speed"
In Weeks 1-4, we learned RDF/OWL. These are W3C standards, designed for exchange.
Think of RDF as Esperanto. It's the universal language for sharing data between organizations. It's precise but can be verbose.
Neo4j is like speaking your native language. It's optimized for speed and application building. When building a real-time recommendation engine, you prioritize performance over universal exchange standards.
Neo4j is a native graph database that stores and queries data as a graph structure.
Why Neo4j?
| Feature | Benefit |
|---|---|
| Native Graph Storage | Optimized for connected data |
| Index-Free Adjacency | Fast traversals without index lookups |
| ACID Compliant | Reliable transactions |
| Cypher Query Language | Intuitive pattern matching |
| Scalability | Enterprise clustering support |
RDF vs Labeled Property Graph
| Aspect | RDF | LPG (Neo4j) |
|---|---|---|
| Model | Subject-Predicate-Object | Nodes-Relationships-Properties |
| Schema | Flexible, ontology-based | Optional schema constraints |
| Properties | Reification needed | Native on nodes and edges |
| Query Language | SPARQL | Cypher |
| Use Case | Semantic web, linked data | Application databases |
2. The Labeled Property Graph Model
Core Concepts
Node: An entity with labels and properties
- Labels categorize nodes (e.g., :Person, :Company)
- Properties store data (e.g., name, age)
Relationship: A connection between nodes
- Has a type (e.g., WORKS_FOR, KNOWS)
- Has direction (start → end)
- Can have properties (e.g., since, role)Visual Representation
(:Person {name: "Alice", age: 30})-[:WORKS_FOR {since: 2020}]->(:Company {name: "Acme"})3. Cypher Query Language
Basic Patterns
// Create nodes
CREATE (p:Person {name: "Alice", age: 30})
CREATE (c:Company {name: "Acme Corp"})
// Create relationships
MATCH (p:Person {name: "Alice"})
MATCH (c:Company {name: "Acme Corp"})
CREATE (p)-[:WORKS_FOR {since: 2020}]->(c)
// Query patterns
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN p.name, c.nameCommon Query Patterns
// Find all employees of a company
MATCH (p:Person)-[:WORKS_FOR]->(c:Company {name: "Acme Corp"})
RETURN p.name
// Find friends of friends
MATCH (me:Person {name: "Alice"})-[:KNOWS*2]->(fof:Person)
WHERE me <> fof
RETURN DISTINCT fof.name
// Shortest path
MATCH path = shortestPath(
(a:Person {name: "Alice"})-[:KNOWS*]-(b:Person {name: "Bob"})
)
RETURN path
// Aggregate data
MATCH (c:Company)<-[:WORKS_FOR]-(p:Person)
RETURN c.name, count(p) as employees
ORDER BY employees DESCFiltering and Conditions
// WHERE clause
MATCH (p:Person)
WHERE p.age > 25 AND p.name STARTS WITH "A"
RETURN p
// Pattern predicates
MATCH (p:Person)
WHERE EXISTS { (p)-[:WORKS_FOR]->(:Company) }
RETURN p.name
// List operations
MATCH (p:Person)
WHERE p.skills IS NOT NULL AND "Python" IN p.skills
RETURN p.name4. Data Modeling in Neo4j
Best Practices
- Model entities as nodes with descriptive labels
- Model connections as relationships with verb-phrase types
- Store entity attributes as node properties
- Store relationship metadata as relationship properties
- Avoid over-connecting: Not every association needs a relationship
Example: Movie Database
// Create movie data model
CREATE (m:Movie {title: "The Matrix", released: 1999})
CREATE (k:Person {name: "Keanu Reeves", born: 1964})
CREATE (l:Person {name: "Lana Wachowski", born: 1965})
CREATE (k)-[:ACTED_IN {role: "Neo"}]->(m)
CREATE (l)-[:DIRECTED]->(m)
// Query actors and their movies
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, r.role, m.titleIndexes and Constraints
// Create index for faster lookups
CREATE INDEX person_name FOR (p:Person) ON (p.name)
// Create unique constraint
CREATE CONSTRAINT unique_email FOR (p:Person)
REQUIRE p.email IS UNIQUE
// Create node key (composite uniqueness)
CREATE CONSTRAINT movie_key FOR (m:Movie)
REQUIRE (m.title, m.released) IS NODE KEY5. Python Integration with Neo4j
Using the Official Driver
from neo4j import GraphDatabase
class Neo4jConnection:
def __init__(self, uri, user, password):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def close(self):
self.driver.close()
def query(self, query, parameters=None):
with self.driver.session() as session:
result = session.run(query, parameters)
return [record.data() for record in result]
def create_person(self, name, age):
query = """
CREATE (p:Person {name: $name, age: $age})
RETURN p
"""
return self.query(query, {"name": name, "age": age})
# Usage
conn = Neo4jConnection("bolt://localhost:7687", "neo4j", "password")
# Create data
conn.create_person("Alice", 30)
# Query data
result = conn.query("""
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN p.name as person, c.name as company
""")
for row in result:
print(f"{row['person']} works at {row['company']}")
conn.close()Using py2neo
from py2neo import Graph, Node, Relationship
# Connect to Neo4j
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))
# Create nodes
alice = Node("Person", name="Alice", age=30)
acme = Node("Company", name="Acme Corp")
# Create relationship
works_for = Relationship(alice, "WORKS_FOR", acme, since=2020)
# Commit to database
graph.create(works_for)
# Query with Cypher
results = graph.run("""
MATCH (p:Person)-[r:WORKS_FOR]->(c:Company)
RETURN p.name, r.since, c.name
""")
for record in results:
print(record)6. Graph Algorithms
Using the Graph Data Science Library
// PageRank - find influential nodes
CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
// Community Detection
CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId
RETURN communityId, collect(gds.util.asNode(nodeId).name) as members
// Shortest Path
CALL gds.shortestPath.dijkstra.stream('myGraph', {
sourceNode: startNode,
targetNode: endNode,
relationshipWeightProperty: 'cost'
})
YIELD path
RETURN pathCreating Graph Projections
// Create in-memory graph projection
CALL gds.graph.project(
'myGraph',
'Person',
'KNOWS',
{
relationshipProperties: 'weight'
}
)
// Run algorithm on projection
CALL gds.betweennessCentrality.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10Project: Movie Recommendation Knowledge Graph
Progress
| Week | Topic | Project Milestone |
|---|---|---|
| 1 | Ontology Introduction | Movie domain design completed |
| 2 | RDF & RDFS | 10 movies converted to RDF |
| 3 | OWL & Reasoning | Inference rules applied |
| 4 | Knowledge Extraction | 100 movies auto-collected |
| 5 | Neo4j | Store in graph DB and query |
| 6 | GraphRAG | Natural language queries |
| 7 | Ontology Agent | Automatic updates for new movies |
| 8 | Domain Extension | Medical/Legal/Finance cases |
| 9 | Service Deployment | API + Dashboard |
Week 5 Milestone: Storing Movie Knowledge Graph in Neo4j
This week, you will migrate the RDF data to a Neo4j graph database and write recommendation queries in Cypher.
Neo4j Schema:
// Nodes
(:Movie {title, releaseDate, rating, runtime})
(:Person {name, birthDate})
(:Genre {name})
// Relationships
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)
(:Movie)-[:HAS_GENRE]->(:Genre)Recommendation Query Example:
// Recommend movies with the same genre AND same actor as Inception
MATCH (m:Movie {title: "Inception"})-[:HAS_GENRE]->(g:Genre),
(m)<-[:ACTED_IN]-(a:Person)-[:ACTED_IN]->(rec:Movie)
WHERE rec <> m AND (rec)-[:HAS_GENRE]->(g)
RETURN rec.title, count(*) as score
ORDER BY score DESC LIMIT 5Performance Optimization:
- Index:
CREATE INDEX ON :Movie(title) - Constraint:
CREATE CONSTRAINT ON (m:Movie) ASSERT m.title IS UNIQUE
In the project notebook, you'll store movie data in a graph database and write recommendation queries.
In the project notebook, you will implement:
- Run Neo4j with Docker and connect via Python
- Create 100 movies + director/actor nodes
- Write "Movies similar to Inception" Cypher recommendation query
- Create indexes for 10x query performance improvement
What you'll build by Week 9: An AI agent that answers "Recommend sci-fi movies like Nolan's style" by reasoning over director-genre-rating relationships in the knowledge graph
Practice Notebook
For deeper exploration of the theory:
The practice notebook covers additional topics:
- Neo4j Aura cloud setup
- APOC library utilities
- Query optimization with EXPLAIN/PROFILE
- Graph algorithms (PageRank, Community Detection)
Interview Questions
When would you choose Neo4j over a relational database?
Choose Neo4j when:
- Data is highly connected with complex relationships
- Queries involve multiple joins or path traversals
- Schema is flexible and evolving
- Real-time recommendations or fraud detection
- Social network analysis or knowledge graphs
Stick with RDBMS when:
- Data is tabular with simple relationships
- Transactions require strict ACID compliance at scale
- Reporting needs are primarily aggregate-based
Premium Content
Want complete solutions with detailed explanations and production-ready code?
Check out the Ontology & Knowledge Graph Cookbook Premium (opens in a new tab) for:
- Complete notebook solutions with step-by-step explanations
- Real-world case studies and best practices
- Interview preparation materials
- Production deployment guides
Next Steps
In Week 6: GraphRAG, you will learn how to combine graph databases with retrieval-augmented generation.