en
Tutorials
Week 5: Neo4j

Week 5: Neo4j

Learning Objectives

This week introduces Neo4j, the leading graph database. You will learn about the Labeled Property Graph model, write Cypher queries, and integrate Neo4j with Python applications.


1. Introduction to Neo4j

1. Introduction to Neo4j

Theory vs. Practice: "Esperanto" vs. "Native Speed"

In Weeks 1-4, we learned RDF/OWL. These are W3C standards, designed for exchange.

Think of RDF as Esperanto. It's the universal language for sharing data between organizations. It's precise but can be verbose.

Neo4j is like speaking your native language. It's optimized for speed and application building. When building a real-time recommendation engine, you prioritize performance over universal exchange standards.

Neo4j is a native graph database that stores and queries data as a graph structure.

Why Neo4j?

FeatureBenefit
Native Graph StorageOptimized for connected data
Index-Free AdjacencyFast traversals without index lookups
ACID CompliantReliable transactions
Cypher Query LanguageIntuitive pattern matching
ScalabilityEnterprise clustering support

RDF vs Labeled Property Graph

AspectRDFLPG (Neo4j)
ModelSubject-Predicate-ObjectNodes-Relationships-Properties
SchemaFlexible, ontology-basedOptional schema constraints
PropertiesReification neededNative on nodes and edges
Query LanguageSPARQLCypher
Use CaseSemantic web, linked dataApplication databases

2. The Labeled Property Graph Model

Core Concepts

Node: An entity with labels and properties
  - Labels categorize nodes (e.g., :Person, :Company)
  - Properties store data (e.g., name, age)

Relationship: A connection between nodes
  - Has a type (e.g., WORKS_FOR, KNOWS)
  - Has direction (start → end)
  - Can have properties (e.g., since, role)

Visual Representation

(:Person {name: "Alice", age: 30})-[:WORKS_FOR {since: 2020}]->(:Company {name: "Acme"})

3. Cypher Query Language

Basic Patterns

// Create nodes
CREATE (p:Person {name: "Alice", age: 30})
CREATE (c:Company {name: "Acme Corp"})
 
// Create relationships
MATCH (p:Person {name: "Alice"})
MATCH (c:Company {name: "Acme Corp"})
CREATE (p)-[:WORKS_FOR {since: 2020}]->(c)
 
// Query patterns
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN p.name, c.name

Common Query Patterns

// Find all employees of a company
MATCH (p:Person)-[:WORKS_FOR]->(c:Company {name: "Acme Corp"})
RETURN p.name
 
// Find friends of friends
MATCH (me:Person {name: "Alice"})-[:KNOWS*2]->(fof:Person)
WHERE me <> fof
RETURN DISTINCT fof.name
 
// Shortest path
MATCH path = shortestPath(
    (a:Person {name: "Alice"})-[:KNOWS*]-(b:Person {name: "Bob"})
)
RETURN path
 
// Aggregate data
MATCH (c:Company)<-[:WORKS_FOR]-(p:Person)
RETURN c.name, count(p) as employees
ORDER BY employees DESC

Filtering and Conditions

// WHERE clause
MATCH (p:Person)
WHERE p.age > 25 AND p.name STARTS WITH "A"
RETURN p
 
// Pattern predicates
MATCH (p:Person)
WHERE EXISTS { (p)-[:WORKS_FOR]->(:Company) }
RETURN p.name
 
// List operations
MATCH (p:Person)
WHERE p.skills IS NOT NULL AND "Python" IN p.skills
RETURN p.name

4. Data Modeling in Neo4j

Best Practices

  • Model entities as nodes with descriptive labels
  • Model connections as relationships with verb-phrase types
  • Store entity attributes as node properties
  • Store relationship metadata as relationship properties
  • Avoid over-connecting: Not every association needs a relationship

Example: Movie Database

// Create movie data model
CREATE (m:Movie {title: "The Matrix", released: 1999})
CREATE (k:Person {name: "Keanu Reeves", born: 1964})
CREATE (l:Person {name: "Lana Wachowski", born: 1965})
 
CREATE (k)-[:ACTED_IN {role: "Neo"}]->(m)
CREATE (l)-[:DIRECTED]->(m)
 
// Query actors and their movies
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, r.role, m.title

Indexes and Constraints

// Create index for faster lookups
CREATE INDEX person_name FOR (p:Person) ON (p.name)
 
// Create unique constraint
CREATE CONSTRAINT unique_email FOR (p:Person)
REQUIRE p.email IS UNIQUE
 
// Create node key (composite uniqueness)
CREATE CONSTRAINT movie_key FOR (m:Movie)
REQUIRE (m.title, m.released) IS NODE KEY

5. Python Integration with Neo4j

Using the Official Driver

from neo4j import GraphDatabase
 
class Neo4jConnection:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))
 
    def close(self):
        self.driver.close()
 
    def query(self, query, parameters=None):
        with self.driver.session() as session:
            result = session.run(query, parameters)
            return [record.data() for record in result]
 
    def create_person(self, name, age):
        query = """
        CREATE (p:Person {name: $name, age: $age})
        RETURN p
        """
        return self.query(query, {"name": name, "age": age})
 
# Usage
conn = Neo4jConnection("bolt://localhost:7687", "neo4j", "password")
 
# Create data
conn.create_person("Alice", 30)
 
# Query data
result = conn.query("""
    MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
    RETURN p.name as person, c.name as company
""")
 
for row in result:
    print(f"{row['person']} works at {row['company']}")
 
conn.close()

Using py2neo

from py2neo import Graph, Node, Relationship
 
# Connect to Neo4j
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))
 
# Create nodes
alice = Node("Person", name="Alice", age=30)
acme = Node("Company", name="Acme Corp")
 
# Create relationship
works_for = Relationship(alice, "WORKS_FOR", acme, since=2020)
 
# Commit to database
graph.create(works_for)
 
# Query with Cypher
results = graph.run("""
    MATCH (p:Person)-[r:WORKS_FOR]->(c:Company)
    RETURN p.name, r.since, c.name
""")
 
for record in results:
    print(record)

6. Graph Algorithms

Using the Graph Data Science Library

// PageRank - find influential nodes
CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
 
// Community Detection
CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId
RETURN communityId, collect(gds.util.asNode(nodeId).name) as members
 
// Shortest Path
CALL gds.shortestPath.dijkstra.stream('myGraph', {
    sourceNode: startNode,
    targetNode: endNode,
    relationshipWeightProperty: 'cost'
})
YIELD path
RETURN path

Creating Graph Projections

// Create in-memory graph projection
CALL gds.graph.project(
    'myGraph',
    'Person',
    'KNOWS',
    {
        relationshipProperties: 'weight'
    }
)
 
// Run algorithm on projection
CALL gds.betweennessCentrality.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10

Project: Movie Recommendation Knowledge Graph

Progress

WeekTopicProject Milestone
1Ontology IntroductionMovie domain design completed
2RDF & RDFS10 movies converted to RDF
3OWL & ReasoningInference rules applied
4Knowledge Extraction100 movies auto-collected
5Neo4jStore in graph DB and query
6GraphRAGNatural language queries
7Ontology AgentAutomatic updates for new movies
8Domain ExtensionMedical/Legal/Finance cases
9Service DeploymentAPI + Dashboard

Week 5 Milestone: Storing Movie Knowledge Graph in Neo4j

This week, you will migrate the RDF data to a Neo4j graph database and write recommendation queries in Cypher.

Neo4j Schema:

// Nodes
(:Movie {title, releaseDate, rating, runtime})
(:Person {name, birthDate})
(:Genre {name})
 
// Relationships
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)
(:Movie)-[:HAS_GENRE]->(:Genre)

Recommendation Query Example:

// Recommend movies with the same genre AND same actor as Inception
MATCH (m:Movie {title: "Inception"})-[:HAS_GENRE]->(g:Genre),
      (m)<-[:ACTED_IN]-(a:Person)-[:ACTED_IN]->(rec:Movie)
WHERE rec <> m AND (rec)-[:HAS_GENRE]->(g)
RETURN rec.title, count(*) as score
ORDER BY score DESC LIMIT 5

Performance Optimization:

  • Index: CREATE INDEX ON :Movie(title)
  • Constraint: CREATE CONSTRAINT ON (m:Movie) ASSERT m.title IS UNIQUE

In the project notebook, you'll store movie data in a graph database and write recommendation queries.

In the project notebook, you will implement:

  • Run Neo4j with Docker and connect via Python
  • Create 100 movies + director/actor nodes
  • Write "Movies similar to Inception" Cypher recommendation query
  • Create indexes for 10x query performance improvement

What you'll build by Week 9: An AI agent that answers "Recommend sci-fi movies like Nolan's style" by reasoning over director-genre-rating relationships in the knowledge graph


Practice Notebook

For deeper exploration of the theory:

The practice notebook covers additional topics:

  • Neo4j Aura cloud setup
  • APOC library utilities
  • Query optimization with EXPLAIN/PROFILE
  • Graph algorithms (PageRank, Community Detection)

Interview Questions

When would you choose Neo4j over a relational database?

Choose Neo4j when:

  • Data is highly connected with complex relationships
  • Queries involve multiple joins or path traversals
  • Schema is flexible and evolving
  • Real-time recommendations or fraud detection
  • Social network analysis or knowledge graphs

Stick with RDBMS when:

  • Data is tabular with simple relationships
  • Transactions require strict ACID compliance at scale
  • Reporting needs are primarily aggregate-based

Premium Content

Want complete solutions with detailed explanations and production-ready code?

Check out the Ontology & Knowledge Graph Cookbook Premium (opens in a new tab) for:

  • Complete notebook solutions with step-by-step explanations
  • Real-world case studies and best practices
  • Interview preparation materials
  • Production deployment guides

Next Steps

In Week 6: GraphRAG, you will learn how to combine graph databases with retrieval-augmented generation.