Introduction


apoc

License


Apache License 2.0

"APOC" Name history


Apoc was the technician and driver on board of the Nebuchadnezzar in the Matrix movie. He was killed by Cypher.

APOC was also the first bundled A Package Of Components for Neo4j in 2009.

APOC also stands for "Awesome Procedures On Cypher"

Download latest release (for the regular user)


Go to http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/latest to find the latest release and download the binary jar to place into your $NEO4J_HOME/plugins folder.

Build & install the current development branch from source


git clone http://github.com/neo4j-contrib/neo4j-apoc-procedures
./gradlew shadow
cp build/libs/apoc-<version>-SNAPSHOT-extended.jar $NEO4J_HOME/plugins/
$NEO4J_HOME/bin/neo4j restart

A full build including running the tests can be run by ./gradlew build.

Calling Procedures within Cypher


Procedures can be called stand-alone with CALL procedure.name();

But you can also integrate them into your Cypher statements which makes them so much more powerful.

Load JSON example
WITH 'https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/master/src/test/resources/person.json' AS url

CALL apoc.load.json(url) YIELD value as person

MERGE (p:Person {name:person.name})
   ON CREATE SET p.age = person.age, p.children = size(person.children)

Overview of APOC Procedures


Manual Indexes


Index Queries

Procedures to add to and query manual indexes

Please note that there are (case-sensitive) automatic schema indexes, for equality, non-equality, existence, range queries, starts with, ends-with and contains!

apoc.index.addAllNodes('index-name',{label1:['prop1',…​],…​})

add all nodes to this full text index with the given fields, additionally populates a 'search' index field with all of them in one place

apoc.index.addNode(node,['prop1',…​])

add node to an index for each label it has

apoc.index.addNodeByLabel(node,'Label',['prop1',…​])

add node to an index for the given label

apoc.index.addRelationship(rel,['prop1',…​])

add relationship to an index for its type

apoc.index.nodes with score

apoc.index.search('index-name', 'query') YIELD node, weight

search for the first 100 nodes in the given full text index matching the given lucene query returned by relevance

apoc.index.nodes('Label','prop:value*') YIELD node, weight

lucene query on node index with the given label name

apoc.index.relationships('TYPE','prop:value*') YIELD rel, weight

lucene query on relationship index with the given type name

apoc.index.between(node1,'TYPE',node2,'prop:value*') YIELD rel, weight

lucene query on relationship index with the given type name bound by either or both sides (each node parameter can be null)

apoc.index.out(node,'TYPE','prop:value*') YIELD node, weight

lucene query on relationship index with the given type name for outgoing relationship of the given node, returns end-nodes

apoc.index.in(node,'TYPE','prop:value*') YIELD node, weight

lucene query on relationship index with the given type name for incoming relationship of the given node, returns start-nodes

Index Management

CALL apoc.index.list() YIELD type,name,config

lists all manual indexes

CALL apoc.index.remove('name') YIELD type,name,config

removes manual indexes

CALL apoc.index.forNodes('name',{config}) YIELD type,name,config

gets or creates manual node index

CALL apoc.index.forRelationships('name',{config}) YIELD type,name,config

gets or creates manual relationship index

Add node to index example
match (p:Person) call apoc.index.addNode(p,["name","age"]) RETURN count(*);
// 129s for 1M People
call apoc.index.nodes('Person','name:name100*') YIELD node, weight return * limit 2

Schema Index Queries

Schema Index lookups that keep order and can apply limits

apoc.index.orderedRange(label,key,min,max,sort-relevance,limit) yield node

schema range scan which keeps index order and adds limit, values can be null, boundaries are inclusive

apoc.index.orderedByText(label,key,operator,value,sort-relevance,limit) yield node

schema string search which keeps index order and adds limit, operator is 'STARTS WITH' or 'CONTAINS'

Export / Import


Data is exported as cypher statements (for neo4j-shell, and partly apoc.cypher.runFile to the given file.

Loading Data from RDBMS


apoc jdbc northwind load

CALL apoc.load.jdbc('jdbc:derby:derbyDB','PERSON') YIELD row CREATE (:Person {name:row.name})

load from relational database, either a full table or a sql statement

CALL apoc.load.jdbc('jdbc:derby:derbyDB','SELECT * FROM PERSON WHERE AGE > 18')

load from relational database, either a full table or a sql statement

CALL apoc.load.driver('org.apache.derby.jdbc.EmbeddedDriver')

register JDBC driver of source database

To simplify the JDBC URL syntax, for example 'jdbc:derby:derbyDB', and so as to mask the details one can define in conf/neo4j.conf

apoc.jdbc.derby.url=jdbc:derby:derbyDB

and then

CALL apoc.load.jdbc('jdbc:derby:derbyDB','PERSON') YIELD row CREATE (:Person {name:row.name})

becomes

CALL apoc.load.jdbc('derby','PERSON') YIELD row CREATE (:Person {name:row.name})

The 3rd value in the 'apoc.jdbc.derby.url=' effectively defines an <alias> to be used in the apoc.load.jdbc('<alias>',…​.

Loading Data from Web-APIs (JSON, XML, CSV)


CALL apoc.load.xml('http://example.com/test.xml') YIELD value as doc CREATE (p:Person) SET p.name = doc.name

load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text and _children fields.

CALL apoc.load.csv('url',{sep:";"}) YIELD lineNo, list, map

load CSV fom URL as stream of values
config contains any of: {skip:1,limit:5,header:false,sep:'TAB',ignore:['tmp'],arraySep:';',mapping:{years:{type:'int',arraySep:'-',array:false,name:'age',ignore:false}}

Interacting with Elastic Search


apoc.es.stats(host-url-Key)

elastic search statistics

apoc.es.get(host-or-port,index-or-null,type-or-null,id-or-null,query-or-null,payload-or-null) yield value

perform a GET operation

apoc.es.query(host-or-port,index-or-null,type-or-null,query-or-null,payload-or-null) yield value

perform a SEARCH operation

apoc.es.getRaw(host-or-port,path,payload-or-null) yield value

perform a raw GET operation

apoc.es.postRaw(host-or-port,path,payload-or-null) yield value

perform a raw POST operation

apoc.es.post(host-or-port,index-or-null,type-or-null,query-or-null,payload-or-null) yield value

perform a POST operation

apoc.es.put(host-or-port,index-or-null,type-or-null,query-or-null,payload-or-null) yield value

perform a PUT operation

Interacting with Couchbase


CALL apoc.couchbase.get(nodes, bucket, documentId) yield id, expiry, cas, mutationToken, content

Retrieves a couchbase json document by its unique ID

CALL apoc.couchbase.exists(nodes, bucket, documentId) yield value

Check whether a couchbase json document with the given ID does exist

CALL apoc.couchbase.insert(nodes, bucket, documentId, jsonDocument) yield id, expiry, cas, mutationToken, content

Insert a couchbase json document with its unique ID

CALL apoc.couchbase.upsert(nodes, bucket, documentId, jsonDocument) yield id, expiry, cas, mutationToken, content

Insert or overwrite a couchbase json document with its unique ID

CALL apoc.couchbase.append(nodes, bucket, documentId, jsonDocument) yield id, expiry, cas, mutationToken, content

Append a couchbase json document to an existing one

CALL apoc.couchbase.prepend(nodes, bucket, documentId, jsonDocument) yield id, expiry, cas, mutationToken, content

Prepend a couchbase json document to an existing one

CALL apoc.couchbase.remove(nodes, bucket, documentId) yield id, expiry, cas, mutationToken, content

Remove the couchbase json document identified by its unique ID

CALL apoc.couchbase.replace(nodes, bucket, documentId, jsonDocument) yield id, expiry, cas, mutationToken, content

Replace the content of the couchbase json document identified by its unique ID.

CALL apoc.couchbase.query(nodes, bucket, statement) yield queryResult

Executes a plain un-parameterized N1QL statement.

CALL apoc.couchbase.posParamsQuery(nodes, bucket, statement, params) yield queryResult

Executes a N1QL statement with positional parameters.

CALL apoc.couchbase.namedParamsQuery(nodes, bucket, statement, paramNames, paramValues) yield queryResult

Executes a N1QL statement with named parameters.

Copy these jars into the plugins directory:

mvn dependency:copy-dependencies
cp target/dependency/java-client-2.3.1.jar target/dependency/core-io-1.3.1.jar target/dependency/rxjava-1.1.5.jar $NEO4J_HOME/plugins/
CALL apoc.couchbase.get(['localhost'], 'default', 'artist:vincent_van_gogh')

Streaming Data to Gephi


apoc.gephi.add(url-or-key, workspace, data)

streams provided data to Gephi

Notes

Gephi has a streaming plugin, that can provide and accept JSON-graph-data in a streaming fashion.

Make sure to install the plugin firsrt and activate it for your workspace (there is a new "Streaming"-tab besides "Layout"), right-click "Master"→"start" to start the server.

You can provide your workspace name (you might want to rename it before you start thes streaming), otherwise it defaults to workspace0

You can also configure it in conf/neo4j.conf via apoc.gephi.url=url or apoc.gephi.<key>.url=url

Example

match path = (:Person)-[:ACTED_IN]->(:Movie)
WITH path LIMIT 1000
with collect(path) as paths
call apoc.gephi.add(null,'workspace0', paths) yield nodes, relationships, time
return nodes, relationships, time

Generating Graphs


Generate undirected (random direction) graphs with semi-real random distributions based on theoretical models.

apoc.generate.er(noNodes, noEdges, 'label', 'type')

generates a graph according to Erdos-Renyi model (uniform)

apoc.generate.ws(noNodes, degree, beta, 'label', 'type')

generates a graph according to Watts-Strogatz model (clusters)

apoc.generate.ba(noNodes, edgesPerNode, 'label', 'type')

generates a graph according to Barabasi-Albert model (preferential attachment)

apoc.generate.complete(noNodes, 'label', 'type')

generates a complete graph (all nodes connected to all other nodes)

apoc.generate.simple([degrees], 'label', 'type')

generates a graph with the given degree distribution

Example

CALL apoc.generate.ba(1000, 2, 'TestLabel', 'TEST_REL_TYPE')
CALL apoc.generate.ws(1000, null, null, null)
CALL apoc.generate.simple([2,2,2,2], null, null)

Monitoring


(thanks @ikwattro)

apoc.monitor.ids

node and relationships-ids in total and in use

apoc.monitor.kernel

store information such as kernel version, start time, read-only, database-name, store-log-version etc.

apoc.monitor.store

store size information for the different types of stores

apoc.monitor.tx

number of transactions total,opened,committed,concurrent,rolled-back,last-tx-id

apoc.monitor.locks(minWaitTime long)

db locking information such as avertedDeadLocks, lockCount, contendedLockCount and contendedLocks etc. (enterprise)

Cypher Execution


CALL apoc.cypher.runFile(file or url) yield row, result

runs each statement in the file, all semicolon separated - currently no schema operations

CALL apoc.cypher.mapParallel(fragment, params, list-to-parallelize) yield value

executes fragment in parallel batches with the list segments being assigned to _

Helpers


Lookup Functions

apoc.get.nodes(node

id

[ids]) yield node

quickly returns all nodes with these id’s

apoc.get.rels(rels

id

[ids]) yield rel

quickly returns all relationships with these id’s

Relationship Filter

Syntax: [<]RELATIONSHIP_TYPE1[>]|[<]RELATIONSHIP_TYPE2[>]|…​

input type direction

LIKES>

LIKES

OUTGOING

<FOLLOWS

FOLLOWS

INCOMING

KNOWS

KNOWS

BOTH

Label Filter

Syntax: [+-/]LABEL1|LABEL2|…​

input label result

+Friend

Friend

include label (whitelist)

-Foe

Foe

exclude label (blacklist)

/Friend

Friend

stop traversal after reaching a friend (but include him)

Text and Lookup Indexes


Index Queries

Procedures to add to and query manual indexes

Please note that there are (case-sensitive) automatic schema indexes, for equality, non-equality, existence, range queries, starts with, ends-with and contains!

apoc.index.addAllNodes('index-name',{label1:['prop1',…​],…​})

add all nodes to this full text index with the given fields, additionally populates a 'search' index field with all of them in one place

apoc.index.addNode(node,['prop1',…​])

add node to an index for each label it has

apoc.index.addNodeByLabel(node,'Label',['prop1',…​])

add node to an index for the given label

apoc.index.addRelationship(rel,['prop1',…​])

add relationship to an index for its type

apoc.index.nodes with score

apoc.index.search('index-name', 'query') YIELD node, weight

search for the first 100 nodes in the given full text index matching the given lucene query returned by relevance

apoc.index.nodes('Label','prop:value*') YIELD node, weight

lucene query on node index with the given label name

apoc.index.relationships('TYPE','prop:value*') YIELD rel, weight

lucene query on relationship index with the given type name

apoc.index.between(node1,'TYPE',node2,'prop:value*') YIELD rel, weight

lucene query on relationship index with the given type name bound by either or both sides (each node parameter can be null)

apoc.index.out(node,'TYPE','prop:value*') YIELD node, weight

lucene query on relationship index with the given type name for outgoing relationship of the given node, returns end-nodes

apoc.index.in(node,'TYPE','prop:value*') YIELD node, weight

lucene query on relationship index with the given type name for incoming relationship of the given node, returns start-nodes

Index Management

CALL apoc.index.list() YIELD type,name,config

lists all manual indexes

CALL apoc.index.remove('name') YIELD type,name,config

removes manual indexes

CALL apoc.index.forNodes('name',{config}) YIELD type,name,config

gets or creates manual node index

CALL apoc.index.forRelationships('name',{config}) YIELD type,name,config

gets or creates manual relationship index

Add node to index example
match (p:Person) call apoc.index.addNode(p,["name","age"]) RETURN count(*);
// 129s for 1M People
call apoc.index.nodes('Person','name:name100*') YIELD node, weight return * limit 2

Manual Indexes

Data Used


The below examples use flight data.

Here is a sample subset of the data that can be load to try the procedures:

CREATE (slc:Airport {abbr:'SLC', id:14869, name:'SALT LAKE CITY INTERNATIONAL'})
CREATE (oak:Airport {abbr:'OAK', id:13796, name:'METROPOLITAN OAKLAND INTERNATIONAL'})
CREATE (bur:Airport {abbr:'BUR', id:10800, name:'BOB HOPE'})
CREATE (f2:Flight {flight_num:6147, day:2, month:1, weekday:6, year:2016})
CREATE (f9:Flight {flight_num:6147, day:9, month:1, weekday:6, year:2016})
CREATE (f16:Flight {flight_num:6147, day:16, month:1, weekday:6, year:2016})
CREATE (f23:Flight {flight_num:6147, day:23, month:1, weekday:6, year:2016})
CREATE (f30:Flight {flight_num:6147, day:30, month:1, weekday:6, year:2016})
CREATE (f2)-[:DESTINATION {arr_delay:-13, taxi_time:9}]->(oak)
CREATE (f9)-[:DESTINATION {arr_delay:-8, taxi_time:4}]->(bur)
CREATE (f16)-[:DESTINATION {arr_delay:-30, taxi_time:4}]->(slc)
CREATE (f23)-[:DESTINATION {arr_delay:-21, taxi_time:3}]->(slc)
CREATE (f30)-[:DESTINATION]->(slc)

Using Manual Index on Node Properties


In order to create manual index on a node property, you call apoc.index.addNode with the node, providing the properties to be indexed.

MATCH (a:Airport)
CALL apoc.index.addNode(a,['name'])
RETURN count(*)

The statement will create the node index with the same name as the Label name(s) of the node in this case Airport and add the node by their properties to the index.

Once this has been added check if the node index exists using apoc.index.list.

CALL apoc.index.list()

Usually apoc.index.addNode would be used as part of node-creation, e.g. during LOAD CSV. There is also apoc.index.addNodes for adding a list of multiple nodes at once.

Once the node index is created we can start using it.

Here are some examples:

The apoc.index.nodes finds nodes in a manual index using the given lucene query.

That makes only sense if you combine multiple properties in one lookup or use case insensitive or fuzzy matching full-text queries. In all other cases the built in schema indexes should be used.
CALL apoc.index.nodes('Airport','name:inter*') YIELD node AS airport, weight
RETURN airport.name, weight
LIMIT 10
Apoc index queries not only return nodes and relationships but also a weight, which is the score returned from the underlying Lucene index. The results are also sorted by that score. That’s especially helpful for partial and fuzzy text searches.

To remove the node index Airport created, use:

CALL apoc.index.remove('Airport')

Using Manual Index on Relationship Properties


The procedure apoc.index.addRelationship is used to create a manual index on relationship properties.

As there are no schema indexes for relationships, these manual indexes can be quite useful.

MATCH (:Flight)-[r:DESTINATION]->(:Airport)
CALL apoc.index.addRelationship(r,['taxi_time'])
RETURN count(*)

The statement will create the relationship index with the same name as relationship-type, in this case DESTINATION and add the relationship by its properties to the index.

Using apoc.index.relationships, we can find the relationship of type DESTINATION with the property taxi_time of 11 minutes. We can chose to also return the start and end-node.

CALL apoc.index.relationships('DESTINATION','taxi_time:11') YIELD rel, start AS flight, end AS airport
RETURN flight_num.flight_num, airport.name;
Manual relationship indexed do not only store the relationship by its properties but also the start- and end-node.

That’s why we can use that information to subselect relationships not only by property but also by those nodes, which is quite powerful.

With apoc.index.in we can pin the node with incoming relationships (end-node) to get the start nodes for all the DESTINATION relationships. For instance to find all flights arriving in 'SALT LAKE CITY INTERNATIONAL' with a taxi_time of 7 minutes we’d use:

MATCH (a:Airport {name:'SALT LAKE CITY INTERNATIONAL'})
CALL apoc.index.in(a,'DESTINATION','taxi_time:7') YIELD node AS flight
RETURN flight

The opposite is apoc.index.out, which takes and binds end-nodes and returns start-nodes of relationships.

Really useful to quickly find a subset of relationships between nodes with many relationships (tens of thousands to millions) is apoc.index.between. Here you bind both the start and end-node and provide (or not) properties of the relationships.

MATCH (f:Flight {flight_num:6147})
MATCH (a:Airport {name:'SALT LAKE CITY INTERNATIONAL'})
CALL apoc.index.between(f,'DESTINATION',a,'taxi_time:7') YIELD rel, weight
RETURN *

To remove the relationship index DESTINATION that was created, use.

CALL apoc.index.remove('DESTINATION')

Full Text Search

Indexes are used for finding nodes in the graph that further operations can then continue from. Just like in a book where you look at the index to find a section that interest you, and then start reading from there. A full text index allows you to find occurrences of individual words or phrases across all attributes.

In order to use the full text search feature, we have to first index our data by specifying all the attributes we want to index. Here we create a full text index called “locations” (we will use this name when searching in the index) with our data.

CALL apoc.index.addAllNodes('locations',{
  Company: ["name", "description"],
  Person:  ["name","address"],
  Address: ["address"]})

Creating the index will take a little while since the procedure has to read through the entire database to create the index.

We can now use this index to search for nodes in the database. The most simple case would be to search across all data for a particular word.

It does not matter which property that word exists in, any node that has that word in any of its indexed properties will be found.

If you enter a name into the form, all occurrences will be found (but limited to 100 results).

CALL apoc.index.search("locations", '')

Advanced Search


We can further restrict our search to only searching in a particular attribute. In order to search for a Person with an address in France, we use the following.

CALL apoc.index.search("locations", "Person.address:France")

Test it yourself:

CALL apoc.index.search("locations", ".:")

Now we can search for nodes with a specific property value, and then explore their neighbourhoods visually.

But integrating it with an graph query is so much more powerful.

Fulltext and Graph Search


We could for instance search for addresses in the database that contain the word "Paris", and then find all companies registered at those addresses:

CALL apoc.index.search("locations", "Address.address:Paris~") YIELD node AS addr
MATCH (addr)<-[:HAS_ADDRESS]-(company:Company)
RETURN company LIMIT 50

The tilde (~) instructs the index search procedure to do a fuzzy match, allowing us to find “Paris” even if the spelling is slightly off.

We might notice that there are addresses that contain the word “Paris” that are not in Paris, France. For example there might be a Paris Street somewhere.

We can further specify that we want the text to contain both the word Paris, and the word France:

CALL apoc.index.search("locations", "+Address.address:Paris~ +France~")
YIELD node AS addr
MATCH (addr)<-[:HAS_ADDRESS]-(company:Company)
RETURN company LIMIT 50

Complex Searches


Things start to get interesting when we look at how the different entities in Paris are connected to one another. We can do that by finding all the entities with addresses in Paris, then creating all pairs of such entities and finding the shortest path between each such pair:

CALL apoc.index.search("locations", "+Address.address:Paris~ +France~") YIELD node AS addr
MATCH (addr)<-[:HAS_ADDRESS]-(company:Company)
WITH collect(company) AS companies

// create unique pairs
UNWIND companies AS x UNWIND companies AS y
WITH x, y WHERE ID(x) < ID(y)

MATCH path = shortestPath((x)-[*..10]-(y))
RETURN path

For more details on the query syntax used in the second parameter of the search procedure, please see this Lucene query tutorial

Utility Functions


TimeToLive (TTL) - Expire Nodes


Enable cleanup of expired nodes in neo4j.conf with apoc.ttl.enabled=true

30s after startup an index is created:

CREATE INDEX FOR (n:TTL) ON (n.ttl)

At startup a statement is scheduled to run every 60s (or configure in neo4j.conf - apoc.ttl.schedule=120)

MATCH (t:TTL) where t.ttl > timestamp() WITH t LIMIT 1000 DETACH DELETE t

The ttl property holds the time when the node is expired in milliseconds since epoch.

You can expire your nodes by setting the :TTL label and the ttl property:

MATCH (n:Foo) WHERE n.bar SET n:TTL, n.ttl = timestamp() + 10000;

Graph Algorithms


Centrality Algorithms

Setup


Let’s create some test data to run the Centrality algorithms on.

// create 100 nodes
FOREACH (id IN range(0,1000) | CREATE (:Node {id:id}))

// over the cross product (1M) create 100.000 relationships
MATCH (n1:Node),(n2:Node) WITH n1,n2 LIMIT 1000000 WHERE rand() < 0.1

CREATE (n1)-[:TYPE]->(n2)

PageRank Algorithm

Setup


Let’s create some test data to run the PageRank algorithm on.

// create 100 nodes
FOREACH (id IN range(0,1000) | CREATE (:Node {id:id}))

// over the cross product (1M) create 100.000 relationships
MATCH (n1:Node),(n2:Node) WITH n1,n2 LIMIT 1000000 WHERE rand() < 0.1

CREATE (n1)-[:TYPE_1]->(n2)

Graph Refactoring


In order not to have to repeatedly geocode the same thing in multiple queries, especially if the database will be used by many people, it might be a good idea to persist the results in the database so that subsequent calls can use the saved results.

Geocode and persist the result

MATCH (a:Place)
WHERE a.address IS NOT NULL AND a.latitude IS NULL
WITH a LIMIT 1000
CALL apoc.spatial.geocodeOnce(a.address) YIELD location
SET a.latitude = location.latitude
SET a.longitude = location.longitude

Note that the above command only geocodes the first 1000 ‘Place’ nodes that have not already been geocoded. This query can be run multiple times until all places are geocoded. Why would we want to do this? Two good reasons:

  • The geocoding service is a public service that can throttle or blacklist sites that hit the service too heavily, so controlling how much we do is useful.

  • The transaction is updating the database, and it is wise not to update the database with too many things in the same transaction, to avoid using up too much memory. This trick will keep the memory usage very low.

Now make use of the results in distance queries

WITH point({latitude: 48.8582532, longitude: 2.294287}) AS eiffel
MATCH (a:Place)
WHERE a.latitude IS NOT NULL AND a.longitude IS NOT NULL
WITH a, point.distance(point(a), eiffel) AS distance
WHERE distance < 5000
RETURN a.name, distance
ORDER BY distance
LIMIT 100

Data Integration


Load JDBC

CALL apoc.load.jdbc('jdbc:derby:derbyDB','PERSON') YIELD row CREATE (:Person {name:row.name})

load from relational database, either a full table or a sql statement

CALL apoc.load.jdbc('jdbc:derby:derbyDB','SELECT * FROM PERSON WHERE AGE > 18')

load from relational database, either a full table or a sql statement

CALL apoc.load.driver('org.apache.derby.jdbc.EmbeddedDriver')

register JDBC driver of source database

To simplify the JDBC URL syntax, for example 'jdbc:derby:derbyDB', and so as to mask the details one can define in conf/neo4j.conf

apoc.jdbc.derby.url=jdbc:derby:derbyDB

and then

CALL apoc.load.jdbc('jdbc:derby:derbyDB','PERSON') YIELD row CREATE (:Person {name:row.name})

becomes

CALL apoc.load.jdbc('derby','PERSON') YIELD row CREATE (:Person {name:row.name})

The 3rd value in the 'apoc.jdbc.derby.url=' effectively defines an <alias> to be used in the apoc.load.jdbc('<alias>',…​.

MySQL Example


MySQL Northwind Data

select count(*) from products;
+----------+
| count(*) |
+----------+
|       77 |
+----------+
1 row in set (0,00 sec)

describe products;
+-----------------+---------------+------+-----+---------+----------------+
| Field           | Type          | Null | Key | Default | Extra          |
+-----------------+---------------+------+-----+---------+----------------+
| ProductID       | int(11)       | NO   | PRI | NULL    | auto_increment |
| ProductName     | varchar(40)   | NO   | MUL | NULL    |                |
| SupplierID      | int(11)       | YES  | MUL | NULL    |                |
| CategoryID      | int(11)       | YES  | MUL | NULL    |                |
| QuantityPerUnit | varchar(20)   | YES  |     | NULL    |                |
| UnitPrice       | decimal(10,4) | YES  |     | 0.0000  |                |
| UnitsInStock    | smallint(2)   | YES  |     | 0       |                |
| UnitsOnOrder    | smallint(2)   | YES  |     | 0       |                |
| ReorderLevel    | smallint(2)   | YES  |     | 0       |                |
| Discontinued    | bit(1)        | NO   |     | b'0'    |                |
+-----------------+---------------+------+-----+---------+----------------+
10 rows in set (0,00 sec)

Load JDBC Examples


Load the JDBC driver
cypher CALL apoc.load.driver("com.mysql.jdbc.Driver");
Count rows in products table
with "jdbc:mysql://localhost:3306/northwind?user=root" as url
cypher CALL apoc.load.jdbc(url,"products") YIELD row
RETURN count(*);
+----------+
| count(*) |
+----------+
| 77       |
+----------+
1 row
23 ms
Return row from products table
with "jdbc:mysql://localhost:3306/northwind?user=root" as url
cypher CALL apoc.load.jdbc(url,"products") YIELD row
RETURN row limit 1;
+--------------------------------------------------------------------------------+
| row                                                                            |
+--------------------------------------------------------------------------------+
| {UnitPrice -> 18.0000, UnitsOnOrder -> 0, CategoryID -> 1, UnitsInStock -> 39} |
+--------------------------------------------------------------------------------+
1 row
10 ms
apoc load jdbc
Create data from jdbc query results in batches (and in parallel).
CALL apoc.periodic.iterate('
call apoc.load.jdbc("jdbc:mysql://localhost:3306/northwind?user=root","company")',
'CREATE (p:Person) SET p += value', {batchSize:10000, parallel:true})
RETURN batches, total

Cassandra Example


Setup Song database as initial dataset

curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/playlist.cql
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/artists.csv
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/songs.csv
$CASSANDRA_HOME/bin/cassandra
$CASSANDRA_HOME/bin/cqlsh -f playlist.cql

Download the Cassandra JDBC Wrapper, and put it into your $NEO4J_HOME/plugins directory. Add this config option to $NEO4J_HOME/conf/neo4j.conf to make it easier to interact with the cassandra instance.

Add to conf/neo4j.conf
apoc.jdbc.cassandra_songs.url=jdbc:cassandra://localhost:9042/playlist

Restart the server.

Now you can inspect the data in Cassandra with.

CALL apoc.load.jdbc('cassandra_songs','artists_by_first_letter') yield row
RETURN count(*);
╒════════╕
│count(*)│
╞════════╡
│3605    │
└────────┘
CALL apoc.load.jdbc('cassandra_songs','artists_by_first_letter') yield row
RETURN row LIMIT 5;

CALL apoc.load.jdbc('cassandra_songs','artists_by_first_letter') yield row
RETURN row.first_letter, row.artist LIMIT 5;
╒════════════════╤═══════════════════════════════╕
│row.first_letter│row.artist                     │
╞════════════════╪═══════════════════════════════╡
│C               │C.W. Stoneking                 │
├────────────────┼───────────────────────────────┤
│C               │CH2K                           │
├────────────────┼───────────────────────────────┤
│C               │CHARLIE HUNTER WITH LEON PARKER│
├────────────────┼───────────────────────────────┤
│C               │Calvin Harris                  │
├────────────────┼───────────────────────────────┤
│C               │Camané                         │
└────────────────┴───────────────────────────────┘

Let’s create some graph data, we have a look at the track_by_artist table, which contains about 60k records.

CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row RETURN count(*);
CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row
RETURN row LIMIT 5;
CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row
RETURN row.track_id, row.track_length_in_seconds, row.track, row.music_file, row.genre, row.artist, row.starred LIMIT 2;
╒════════════════════════════════════╤══════╤════════════════╤══════════════════╤═════════╤════════════════════════════╤═══════════╕
│row.track_id                        │length│row.track       │row.music_file    │row.genre│row.artist                  │row.starred│
╞════════════════════════════════════╪══════╪════════════════╪══════════════════╪═════════╪════════════════════════════╪═══════════╡
│c0693b1e-0eaa-4e81-b23f-b083db303842│219   │1913 Massacre   │TRYKHMD128F934154C│folk     │Woody Guthrie & Jack Elliott│false      │
├────────────────────────────────────┼──────┼────────────────┼──────────────────┼─────────┼────────────────────────────┼───────────┤
│7d114937-0bc7-41c7-8e0c-94b5654ac77f│178   │Alabammy Bound  │TRMQLPV128F934152B│folk     │Woody Guthrie & Jack Elliott│false      │
└────────────────────────────────────┴──────┴────────────────┴──────────────────┴─────────┴────────────────────────────┴───────────┘

Let’s create some indexes and constraints, note that other indexes and constraints will be dropped by this.

CALL apoc.schema.assert(
  {Track:['title','length']},
  {Artist:['name'],Track:['id'],Genre:['name']});
╒════════════╤═══════╤══════╤═══════╕
│label       │key    │unique│action │
╞════════════╪═══════╪══════╪═══════╡
│Track       │title  │false │CREATED│
├────────────┼───────┼──────┼───────┤
│Track       │length │false │CREATED│
├────────────┼───────┼──────┼───────┤
│Artist      │name   │true  │CREATED│
├────────────┼───────┼──────┼───────┤
│Genre       │name   │true  │CREATED│
├────────────┼───────┼──────┼───────┤
│Track       │id     │true  │CREATED│
└────────────┴───────┴──────┴───────┘
CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row
MERGE (a:Artist {name:row.artist})
MERGE (g:Genre {name:row.genre})
CREATE (t:Track {id:toString(row.track_id), title:row.track, length:row.track_length_in_seconds})
CREATE (a)-[:PERFORMED]->(t)
CREATE (t)-[:GENRE]->(g);
Added 63213 labels, created 63213 nodes, set 182413 properties, created 119200 relationships, statement executed in 40076 ms.

Streaming Data to Gephi

apoc.gephi.add(url-or-key, workspace, data)

streams provided data to Gephi

Notes

Gephi has a streaming plugin, that can provide and accept JSON-graph-data in a streaming fashion.

Make sure to install the plugin firsrt and activate it for your workspace (there is a new "Streaming"-tab besides "Layout"), right-click "Master"→"start" to start the server.

You can provide your workspace name (you might want to rename it before you start thes streaming), otherwise it defaults to workspace0

You can also configure it in conf/neo4j.conf via apoc.gephi.url=url or apoc.gephi.<key>.url=url

Example

match path = (:Person)-[:ACTED_IN]->(:Movie)
WITH path LIMIT 1000
with collect(path) as paths
call apoc.gephi.add(null,'workspace0', paths) yield nodes, relationships, time
return nodes, relationships, time
apoc.gephi

Format


We send all nodes and relationships of the passed in data convert into individual Gephi-Streaming JSON fragements, separated by \r\n.

{"an":{"123":{"TYPE":"Person:Actor","label":"Tom Hanks",                           x:333,y:222,r:0.1,g:0.3,b:0.5}}}\r\n
{"an":{"345":{"TYPE":"Movie","label":"Forrest Gump",                               x:234,y:122,r:0.2,g:0.2,b:0.7}}}\r\n
{"ae":{"3344":{"TYPE":"ACTED_IN","label":"Tom Hanks",source:"123",target:"345","directed":true,r:0.1,g:0.3,b:0.5}}}

Specifics Details


Gephi doesn’t render the graph data unless you also provide x,y coordinates in the payload, so we just send random ones within a 1000x1000 grid.

We also generate colors per label combination and relationship-type, both of which are also transferred as TYPE property.

ElasticSearch Integration

apoc.es.stats(host-url-Key)

elastic search statistics

apoc.es.get(host-or-port,index-or-null,type-or-null,id-or-null,query-or-null,payload-or-null) yield value

perform a GET operation

apoc.es.query(host-or-port,index-or-null,type-or-null,query-or-null,payload-or-null) yield value

perform a SEARCH operation

apoc.es.getRaw(host-or-port,path,payload-or-null) yield value

perform a raw GET operation

apoc.es.postRaw(host-or-port,path,payload-or-null) yield value

perform a raw POST operation

apoc.es.post(host-or-port,index-or-null,type-or-null,query-or-null,payload-or-null) yield value

perform a POST operation

apoc.es.put(host-or-port,index-or-null,type-or-null,query-or-null,payload-or-null) yield value

perform a PUT operation

Example


call apoc.es.post("localhost","tweets","users","1",null,{name:"Chris"})
call apoc.es.get("localhost","tweets","users","1",null,null)
call apoc.es.stats("localhost")
qHAj9ma

General Structure and Parameters


call apoc.es.post(host-or-port,index-or-null,type-or-null,id-or-null,query-or-null,payload-or-null) yield value

// GET/PUT/POST url/index/type/id?query -d payload

host or port parameter

The parameter can be a direct host or url, or an entry to be lookup up in neo4j.conf

  • host

  • host:port

  • http://host:port

  • lookup via key to apoc.es.<key>.url

  • lookup via key apoc.es.<key>.host

  • lookup apoc.es.url

  • lookup apoc.es.host

index parameter

Main ES index, will be sent directly, if null then "_all" multiple indexes can be separated by comma in the string.

type parameter

Document type, will be sent directly, if null then "_all" multiple types can be separated by comma in the string.

id parameter

Document id, will be left off when null.

query parameter

Query can be a map which is turned into a query string, a direct string or null then it is left off.

payload parameter

Payload can be a map which will be turned into a json payload or a string which will be sent directly or null.

Results

Results are stream of map in value.

Load XML

Example: How do you access xml doc attributes in children fields ?


(Thanks Nicolas Rouyer)

For example, if my XML file is the example book.xml provided by microsoft.

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
...

Let’s just load it and see what it looks like, it’s returned as value map with nested _type and _children fields, per group of elements. Attributes are turned into map-entries. And elements into their own little map with _type, attributes and _children if applicable.

call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/master/src/test/resources/books.xml")
{_type: catalog, _children: [
  {_type: book, id: bk101, _children: [
    {_type: author, _text: Gambardella, Matthew},
    {_type: title, _text: XML Developer's Guide},
    {_type: genre, _text: Computer},
    {_type: price, _text: 44.95},
    {_type: publish_date, _text: 2000-10-01},
    {_type: description, _text: An in-depth look at creating applications ....

For each book, how do I access book id ?

You can access attributes per element directly.

call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/master/src/test/resources/books.xml") yield value as catalog
UNWIND catalog._children as book
RETURN book.id
╒═══════╕
│book.id│
╞═══════╡
│bk101  │
├───────┤
│bk102  │

For each book, how do I access book author and title ?

Filter into collection

You have to filter over the sub-elements in the _childrens array in this case.

call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/master/src/test/resources/books.xml") yield value as catalog
UNWIND catalog._children as book
RETURN book.id, [attr IN book._children WHERE attr._type IN ['author','title'] | [attr._type, attr._text]] as pairs
╒═══════╤════════════════════════════════════════════════════════════════════════╕
│book.id│pairs                                                                   │
╞═══════╪════════════════════════════════════════════════════════════════════════╡
│bk101  │[[author, Gambardella, Matthew], [title, XML Developer's Guide]]        │
├───────┼────────────────────────────────────────────────────────────────────────┤
│bk102  │[[author, Ralls, Kim], [title, Midnight Rain]]                          │

Return Collection Elements

This is not too nice, we could also just have returned the values and then grabbed them out of the list, but that relies on element-order.

call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/master/src/test/resources/books.xml") yield value as catalog
UNWIND catalog._children as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['author','title'] | attr._text] as pairs
RETURN id, pairs[0] as author, pairs[1] as title
╒═════╤════════════════════╤══════════════════════════════╕
│id   │author              │title                         │
╞═════╪════════════════════╪══════════════════════════════╡
│bk101│Gambardella, Matthew│XML Developer's Guide         │
├─────┼────────────────────┼──────────────────────────────┤
│bk102│Ralls, Kim          │Midnight Rain                 │

Cypher Operations


Running Cypher fragments

We can use Cypher as safe, graph-aware, partially compiled scripting language within APOC.

Cypher Execution


CALL apoc.cypher.runFile(file or url) yield row, result

runs each statement in the file, all semicolon separated - currently no schema operations

CALL apoc.cypher.mapParallel(fragment, params, list-to-parallelize) yield value

executes fragment in parallel batches with the list segments being assigned to _

Example: Fast Node-Counts by Label


Counts by label are quickly provided by the counts-store, but only if they are the only thing in the query, like

MATCH (:Person) RETURN count(*);

It also works to combine several with UNION ALL, but not via WITH

Doesn’t work
MATCH (:Person) WITH count(*) as people
MATCH (:Movie) RETURN people, count(*) as movies;
Works
MATCH (:Person) RETURN count(*)
UNION ALL
MATCH (:Movie) RETURN count(*);

But with apoc.cypher.run we can construct the statement and run each of them individually, so it completes in a few ms.

call db.labels() yield label
call apoc.cypher.run("match (:`"+label+"`) return count(*) as count", null) yield value
return label, value.count as count

Job management and periodic execution

this document is work in progress
call apoc.periodic.commit("
match (u:User)
where not u:Processed
with u limit {limit}
SET u:Processed RETURN count(*)
",{limit:10000})
+=======+==========+
|updates|executions|
+=======+==========+
|2000000|200       |
+-------+----------+