Week 5: Neo4j

Learning Objectives

This week introduces Neo4j, the leading graph database. You will learn about the Labeled Property Graph model, write Cypher queries, and integrate Neo4j with Python applications.

1. Introduction to Neo4j

Theory vs. Practice: "Esperanto" vs. "Native Speed"

In Weeks 1-4, we learned RDF/OWL. These are W3C standards, designed for exchange.

Think of RDF as Esperanto. It's the universal language for sharing data between organizations. It's precise but can be verbose.

Neo4j is like speaking your native language. It's optimized for speed and application building. When building a real-time recommendation engine, you prioritize performance over universal exchange standards.

Neo4j is a native graph database that stores and queries data as a graph structure.

Why Neo4j?

Feature	Benefit
Native Graph Storage	Optimized for connected data
Index-Free Adjacency	Fast traversals without index lookups
ACID Compliant	Reliable transactions
Cypher Query Language	Intuitive pattern matching
Scalability	Enterprise clustering support

RDF vs Labeled Property Graph

Aspect	RDF	LPG (Neo4j)
Model	Subject-Predicate-Object	Nodes-Relationships-Properties
Schema	Flexible, ontology-based	Optional schema constraints
Properties	Reification needed	Native on nodes and edges
Query Language	SPARQL	Cypher
Use Case	Semantic web, linked data	Application databases

2. The Labeled Property Graph Model

Core Concepts

Node: An entity with labels and properties
  - Labels categorize nodes (e.g., :Person, :Company)
  - Properties store data (e.g., name, age)

Relationship: A connection between nodes
  - Has a type (e.g., WORKS_FOR, KNOWS)
  - Has direction (start → end)
  - Can have properties (e.g., since, role)

Visual Representation

(:Person {name: "Alice", age: 30})-[:WORKS_FOR {since: 2020}]->(:Company {name: "Acme"})

3. Cypher Query Language

Basic Patterns

// Create nodes
CREATE (p:Person {name: "Alice", age: 30})
CREATE (c:Company {name: "Acme Corp"})
 
// Create relationships
MATCH (p:Person {name: "Alice"})
MATCH (c:Company {name: "Acme Corp"})
CREATE (p)-[:WORKS_FOR {since: 2020}]->(c)
 
// Query patterns
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN p.name, c.name

Common Query Patterns

// Find all employees of a company
MATCH (p:Person)-[:WORKS_FOR]->(c:Company {name: "Acme Corp"})
RETURN p.name
 
// Find friends of friends
MATCH (me:Person {name: "Alice"})-[:KNOWS*2]->(fof:Person)
WHERE me <> fof
RETURN DISTINCT fof.name
 
// Shortest path
MATCH path = shortestPath(
    (a:Person {name: "Alice"})-[:KNOWS*]-(b:Person {name: "Bob"})
)
RETURN path
 
// Aggregate data
MATCH (c:Company)<-[:WORKS_FOR]-(p:Person)
RETURN c.name, count(p) as employees
ORDER BY employees DESC

Filtering and Conditions

// WHERE clause
MATCH (p:Person)
WHERE p.age > 25 AND p.name STARTS WITH "A"
RETURN p
 
// Pattern predicates
MATCH (p:Person)
WHERE EXISTS { (p)-[:WORKS_FOR]->(:Company) }
RETURN p.name
 
// List operations
MATCH (p:Person)
WHERE p.skills IS NOT NULL AND "Python" IN p.skills
RETURN p.name

4. Data Modeling in Neo4j

Best Practices

Model entities as nodes with descriptive labels
Model connections as relationships with verb-phrase types
Store entity attributes as node properties
Store relationship metadata as relationship properties
Avoid over-connecting: Not every association needs a relationship

Example: Movie Database

// Create movie data model
CREATE (m:Movie {title: "The Matrix", released: 1999})
CREATE (k:Person {name: "Keanu Reeves", born: 1964})
CREATE (l:Person {name: "Lana Wachowski", born: 1965})
 
CREATE (k)-[:ACTED_IN {role: "Neo"}]->(m)
CREATE (l)-[:DIRECTED]->(m)
 
// Query actors and their movies
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, r.role, m.title

Indexes and Constraints

// Create index for faster lookups
CREATE INDEX person_name FOR (p:Person) ON (p.name)
 
// Create unique constraint
CREATE CONSTRAINT unique_email FOR (p:Person)
REQUIRE p.email IS UNIQUE
 
// Create node key (composite uniqueness)
CREATE CONSTRAINT movie_key FOR (m:Movie)
REQUIRE (m.title, m.released) IS NODE KEY

5. Python Integration with Neo4j

Using the Official Driver

from neo4j import GraphDatabase
 
class Neo4jConnection:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))
 
    def close(self):
        self.driver.close()
 
    def query(self, query, parameters=None):
        with self.driver.session() as session:
            result = session.run(query, parameters)
            return [record.data() for record in result]
 
    def create_person(self, name, age):
        query = """
        CREATE (p:Person {name: $name, age: $age})
        RETURN p
        """
        return self.query(query, {"name": name, "age": age})
 
# Usage
conn = Neo4jConnection("bolt://localhost:7687", "neo4j", "password")
 
# Create data
conn.create_person("Alice", 30)
 
# Query data
result = conn.query("""
    MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
    RETURN p.name as person, c.name as company
""")
 
for row in result:
    print(f"{row['person']} works at {row['company']}")
 
conn.close()

Using py2neo

from py2neo import Graph, Node, Relationship
 
# Connect to Neo4j
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))
 
# Create nodes
alice = Node("Person", name="Alice", age=30)
acme = Node("Company", name="Acme Corp")
 
# Create relationship
works_for = Relationship(alice, "WORKS_FOR", acme, since=2020)
 
# Commit to database
graph.create(works_for)
 
# Query with Cypher
results = graph.run("""
    MATCH (p:Person)-[r:WORKS_FOR]->(c:Company)
    RETURN p.name, r.since, c.name
""")
 
for record in results:
    print(record)

6. Graph Algorithms

Using the Graph Data Science Library

// PageRank - find influential nodes
CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
 
// Community Detection
CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId
RETURN communityId, collect(gds.util.asNode(nodeId).name) as members
 
// Shortest Path
CALL gds.shortestPath.dijkstra.stream('myGraph', {
    sourceNode: startNode,
    targetNode: endNode,
    relationshipWeightProperty: 'cost'
})
YIELD path
RETURN path

Creating Graph Projections

// Create in-memory graph projection
CALL gds.graph.project(
    'myGraph',
    'Person',
    'KNOWS',
    {
        relationshipProperties: 'weight'
    }
)
 
// Run algorithm on projection
CALL gds.betweennessCentrality.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10

Project: Movie Recommendation Knowledge Graph

Progress

Week	Topic	Project Milestone
1	Ontology Introduction	Movie domain design completed
2	RDF & RDFS	10 movies converted to RDF
3	OWL & Reasoning	Inference rules applied
4	Knowledge Extraction	100 movies auto-collected
5	Neo4j	Store in graph DB and query
6	GraphRAG	Natural language queries
7	Ontology Agent	Automatic updates for new movies
8	Domain Extension	Medical/Legal/Finance cases
9	Service Deployment	API + Dashboard

Week 5 Milestone: Storing Movie Knowledge Graph in Neo4j

This week, you will migrate the RDF data to a Neo4j graph database and write recommendation queries in Cypher.

Neo4j Schema:

// Nodes
(:Movie {title, releaseDate, rating, runtime})
(:Person {name, birthDate})
(:Genre {name})
 
// Relationships
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)
(:Movie)-[:HAS_GENRE]->(:Genre)

Recommendation Query Example:

// Recommend movies with the same genre AND same actor as Inception
MATCH (m:Movie {title: "Inception"})-[:HAS_GENRE]->(g:Genre),
      (m)<-[:ACTED_IN]-(a:Person)-[:ACTED_IN]->(rec:Movie)
WHERE rec <> m AND (rec)-[:HAS_GENRE]->(g)
RETURN rec.title, count(*) as score
ORDER BY score DESC LIMIT 5

Performance Optimization:

Index: CREATE INDEX ON :Movie(title)
Constraint: CREATE CONSTRAINT ON (m:Movie) ASSERT m.title IS UNIQUE

In the project notebook, you'll store movie data in a graph database and write recommendation queries.

In the project notebook, you will implement:

Run Neo4j with Docker and connect via Python
Create 100 movies + director/actor nodes
Write "Movies similar to Inception" Cypher recommendation query
Create indexes for 10x query performance improvement

What you'll build by Week 9: An AI agent that answers "Recommend sci-fi movies like Nolan's style" by reasoning over director-genre-rating relationships in the knowledge graph

Practice Notebook

For deeper exploration of the theory:

The practice notebook covers additional topics:

Neo4j Aura cloud setup
APOC library utilities
Query optimization with EXPLAIN/PROFILE
Graph algorithms (PageRank, Community Detection)

Interview Questions

When would you choose Neo4j over a relational database?

Choose Neo4j when:

Data is highly connected with complex relationships
Queries involve multiple joins or path traversals
Schema is flexible and evolving
Real-time recommendations or fraud detection
Social network analysis or knowledge graphs

Stick with RDBMS when:

Data is tabular with simple relationships
Transactions require strict ACID compliance at scale
Reporting needs are primarily aggregate-based

Premium Content

Want complete solutions with detailed explanations and production-ready code?

Check out the Ontology & Knowledge Graph Cookbook Premium (opens in a new tab) for:

Complete notebook solutions with step-by-step explanations
Real-world case studies and best practices
Interview preparation materials
Production deployment guides

Next Steps

In Week 6: GraphRAG, you will learn how to combine graph databases with retrieval-augmented generation.

Week 4: Knowledge Extraction Week 6: GraphRAG