Introduction

Note
Go here for documentation for APOC for Neo4j version 3.0.x 3.1.x 3.2.x
apoc

Neo4j 3.0 introduced the concept of user defined procedures. Those are custom implementations of certain functionality, that can’t be (easily) expressed in Cypher itself. Those procedures are implemented in Java and can be easily deployed into your Neo4j instance, and then be called from Cypher directly.

The APOC library consists of many (about 300) procedures to help with many different tasks in areas like data integration, graph algorithms or data conversion.

License

Apache License 2.0

"APOC" Name history

Apoc was the technician and driver on board of the Nebuchadnezzar in the Matrix movie. He was killed by Cypher.

APOC was also the first bundled A Package Of Components for Neo4j in 2009.

APOC also stands for "Awesome Procedures On Cypher"

Installation

Download latest release

Go to http://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/3.1.2.5 to find the latest release and download the binary jar to place into your $NEO4J_HOME/plugins folder.

Version Compatibility Matrix

Since APOC relies in some places on Neo4j’s internal APIs you need to use the right APOC version for your Neo4j installaton.

Any version to be released after 1.1.0 will use a different, consistent versioning scheme: <neo4j-version>.<apoc> version. The trailing <apoc> part of the version number will be incremented with every apoc release.

apoc version

neo4j version

3.1.0.3

3.1.0 (3.1.x)

3.0.8.4

3.0.8 (3.0.x)

3.0.4.3

3.0.4 (3.0.x)

1.1.0

3.0.0 - 3.0.3

1.0.0

3.0.0 - 3.0.3

using APOC with Neo4j Docker image

The Neo4j Docker image allows to supply a volume for the /plugins folder. Download the APOC release fitting your Neo4j version to local folder plugins and provide it as a data volume:

mkdir plugins
pushd plugins
wget https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/3.1.2.5/apoc-3.1.2.5-extended.jar
popd
docker run --rm -e NEO4J_AUTH=none -p 7474:7474 -v $PWD/plugins:/plugins -p 7687:7687 neo4j:3.1.2

Build & install the current development branch from source

git clone http://github.com/neo4j-contrib/neo4j-apoc-procedures
./gradlew shadow
cp build/libs/apoc-<version>-SNAPSHOT-extended.jar $NEO4J_HOME/plugins/
$NEO4J_HOME/bin/neo4j restart

A full build including running the tests can be run by ./gradlew build.

Calling Procedures & Functions within Cypher

User defined Functions can be used in any expression or predicate, just like built-in functions.

Procedures can be called stand-alone with CALL procedure.name();

But you can also integrate them into your Cypher statements which makes them so much more powerful.

Load JSON example
WITH 'https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/master/src/test/resources/person.json' AS url

CALL apoc.load.json(url) YIELD value as person

MERGE (p:Person {name:person.name})
   ON CREATE SET p.age = person.age, p.children = size(person.children)

Configuration Options

Set these config options in $NEO4J_HOME/neo4j.conf

All boolean options default to false, i.e. they are disabled, unless mentioned otherwise.

apoc.trigger.enabled=true

Enable triggers

apoc.ttl.enabled=true

Enable time to live background task

apoc.ttl.schedule=5

Set frequency in seconds to run ttl background task (default 60)

apoc.import.file.enabled=true

Enable reading local files from disk

apoc.export.file.enabled=true

Enable writing local files to disk

apoc.jdbc.<key>.uri=jdbc-url-with-credentials

store jdbc-urls under a key to be used by apoc.load.jdbc

apoc.es.<key>.uri=es-url-with-credentials

store es-urls under a key to be used by elasticsearch procedures

apoc.mongodb.<key>.uri=mongodb-url-with-credentials

store mongodb-urls under a key to be used by mongodb procedures

apoc.couchbase.<key>.uri=couchbase-url-with-credentials

store couchbase-urls under a key to be used by couchbase procedures

Manual Indexes

Index Queries

Procedures to add to and query manual indexes

Note
Please note that there are (case-sensitive) automatic schema indexes, for equality, non-equality, existence, range queries, starts with, ends-with and contains!

apoc.index.addAllNodes('index-name',{label1:['prop1',…​],…​})

add all nodes to this full text index with the given fields, additionally populates a 'search' index field with all of them in one place

apoc.index.addNode(node,['prop1',…​])

add node to an index for each label it has

apoc.index.addNodeByLabel('Label',node,['prop1',…​])

add node to an index for the given label

apoc.index.addNodeByName('name',node,['prop1',…​])

add node to an index for the given name

apoc.index.addRelationship(rel,['prop1',…​])

add relationship to an index for its type

apoc.index.addRelationshipByName('name',rel,['prop1',…​])

add relationship to an index for the given name

apoc.index.removeNodeByName('name',node) remove node from an index for the given name

apoc.index.removeRelationshipByName('name',rel) remove relationship from an index for the given name

apoc.index.nodes with score

apoc.index.search('index-name', 'query') YIELD node, weight

search for the first 100 nodes in the given full text index matching the given lucene query returned by relevance

apoc.index.nodes('Label','prop:value*') YIELD node, weight

lucene query on node index with the given label name

apoc.index.relationships('TYPE','prop:value*') YIELD rel, weight

lucene query on relationship index with the given type name

apoc.index.between(node1,'TYPE',node2,'prop:value*') YIELD rel, weight

lucene query on relationship index with the given type name bound by either or both sides (each node parameter can be null)

apoc.index.out(node,'TYPE','prop:value*') YIELD node, weight

lucene query on relationship index with the given type name for outgoing relationship of the given node, returns end-nodes

apoc.index.in(node,'TYPE','prop:value*') YIELD node, weight

lucene query on relationship index with the given type name for incoming relationship of the given node, returns start-nodes

Index Management

CALL apoc.index.list() YIELD type,name,config

lists all manual indexes

CALL apoc.index.remove('name') YIELD type,name,config

removes manual indexes

CALL apoc.index.forNodes('name',{config}) YIELD type,name,config

gets or creates manual node index

CALL apoc.index.forRelationships('name',{config}) YIELD type,name,config

gets or creates manual relationship index

Add node to index example
match (p:Person) call apoc.index.addNode(p,["name","age"]) RETURN count(*);
// 129s for 1M People
call apoc.index.nodes('Person','name:name100*') YIELD node, weight return * limit 2

Schema Index Queries

Schema Index lookups that keep order and can apply limits

apoc.index.orderedRange(label,key,min,max,sort-relevance,limit) yield node

schema range scan which keeps index order and adds limit, values can be null, boundaries are inclusive

apoc.index.orderedByText(label,key,operator,value,sort-relevance,limit) yield node

schema string search which keeps index order and adds limit, operator is 'STARTS WITH' or 'CONTAINS'

Loading Data from RDBMS

apoc jdbc northwind load

CALL apoc.load.jdbc('jdbc:derby:derbyDB','PERSON') YIELD row CREATE (:Person {name:row.name})

load from relational database, either a full table or a sql statement

CALL apoc.load.jdbc('jdbc:derby:derbyDB','SELECT * FROM PERSON WHERE AGE > 18')

load from relational database, either a full table or a sql statement

CALL apoc.load.driver('org.apache.derby.jdbc.EmbeddedDriver')

register JDBC driver of source database

To simplify the JDBC URL syntax and protect credentials, you can configure aliases in conf/neo4j.conf:

apoc.jdbc.myDB.url=jdbc:derby:derbyDB
CALL apoc.load.jdbc('jdbc:derby:derbyDB','PERSON')

becomes

CALL apoc.load.jdbc('myDB','PERSON')

The 3rd value in the apoc.jdbc.<alias>.url= effectively defines an alias to be used in apoc.load.jdbc('<alias>',…​.

Loading Data from Web-APIs (JSON, XML, CSV)

CALL apoc.load.json('http://example.com/map.json') YIELD value as person CREATE (p:Person) SET p = person

load from JSON URL (e.g. web-api) to import JSON as stream of values if the JSON was an array or a single value if it was a map

CALL apoc.load.xml('http://example.com/test.xml') YIELD value as doc CREATE (p:Person) SET p.name = doc.name

load from XML URL (e.g. web-api) to import XML as single nested map with attributes and _type, _text and _children fields.

CALL apoc.load.csv('url',{sep:";"}) YIELD lineNo, list, map

load CSV fom URL as stream of values
config contains any of: {skip:1,limit:5,header:false,sep:'TAB',ignore:['tmp'],arraySep:';',mapping:{years:{type:'int',arraySep:'-',array:false,name:'age',ignore:false}}

apoc.es.stats(host-url-Key)

elastic search statistics

apoc.es.get(host-or-port,index-or-null,type-or-null,id-or-null,query-or-null,payload-or-null) yield value

perform a GET operation

apoc.es.query(host-or-port,index-or-null,type-or-null,query-or-null,payload-or-null) yield value

perform a SEARCH operation

apoc.es.getRaw(host-or-port,path,payload-or-null) yield value

perform a raw GET operation

apoc.es.postRaw(host-or-port,path,payload-or-null) yield value

perform a raw POST operation

apoc.es.post(host-or-port,index-or-null,type-or-null,query-or-null,payload-or-null) yield value

perform a POST operation

apoc.es.put(host-or-port,index-or-null,type-or-null,query-or-null,payload-or-null) yield value

perform a PUT operation

Interacting with Couchbase

CALL apoc.couchbase.get(nodes, bucket, documentId) yield id, expiry, cas, mutationToken, content

Retrieves a couchbase json document by its unique ID

CALL apoc.couchbase.exists(nodes, bucket, documentId) yield value

Check whether a couchbase json document with the given ID does exist

CALL apoc.couchbase.insert(nodes, bucket, documentId, jsonDocument) yield id, expiry, cas, mutationToken, content

Insert a couchbase json document with its unique ID

CALL apoc.couchbase.upsert(nodes, bucket, documentId, jsonDocument) yield id, expiry, cas, mutationToken, content

Insert or overwrite a couchbase json document with its unique ID

CALL apoc.couchbase.append(nodes, bucket, documentId, jsonDocument) yield id, expiry, cas, mutationToken, content

Append a couchbase json document to an existing one

CALL apoc.couchbase.prepend(nodes, bucket, documentId, jsonDocument) yield id, expiry, cas, mutationToken, content

Prepend a couchbase json document to an existing one

CALL apoc.couchbase.remove(nodes, bucket, documentId) yield id, expiry, cas, mutationToken, content

Remove the couchbase json document identified by its unique ID

CALL apoc.couchbase.replace(nodes, bucket, documentId, jsonDocument) yield id, expiry, cas, mutationToken, content

Replace the content of the couchbase json document identified by its unique ID.

CALL apoc.couchbase.query(nodes, bucket, statement) yield queryResult

Executes a plain un-parameterized N1QL statement.

CALL apoc.couchbase.posParamsQuery(nodes, bucket, statement, params) yield queryResult

Executes a N1QL statement with positional parameters.

CALL apoc.couchbase.namedParamsQuery(nodes, bucket, statement, paramNames, paramValues) yield queryResult

Executes a N1QL statement with named parameters.

Copy these jars into the plugins directory:

mvn dependency:copy-dependencies
cp target/dependency/java-client-2.3.1.jar target/dependency/core-io-1.3.1.jar target/dependency/rxjava-1.1.5.jar $NEO4J_HOME/plugins/
CALL apoc.couchbase.get(['localhost'], 'default', 'artist:vincent_van_gogh')

Streaming Data to Gephi

apoc.gephi.add(url-or-key, workspace, data)

streams provided data to Gephi

Notes

Gephi has a streaming plugin, that can provide and accept JSON-graph-data in a streaming fashion.

Make sure to install the plugin firsrt and activate it for your workspace (there is a new "Streaming"-tab besides "Layout"), right-click "Master"→"start" to start the server.

You can provide your workspace name (you might want to rename it before you start thes streaming), otherwise it defaults to workspace0

You can also configure it in conf/neo4j.conf via apoc.gephi.url=url or apoc.gephi.<key>.url=url

Example

match path = (:Person)-[:ACTED_IN]->(:Movie)
WITH path LIMIT 1000
with collect(path) as paths
call apoc.gephi.add(null,'workspace0', paths) yield nodes, relationships, time
return nodes, relationships, time

Generating Graphs

Generate undirected (random direction) graphs with semi-real random distributions based on theoretical models.

apoc.generate.er(noNodes, noEdges, 'label', 'type')

generates a graph according to Erdos-Renyi model (uniform)

apoc.generate.ws(noNodes, degree, beta, 'label', 'type')

generates a graph according to Watts-Strogatz model (clusters)

apoc.generate.ba(noNodes, edgesPerNode, 'label', 'type')

generates a graph according to Barabasi-Albert model (preferential attachment)

apoc.generate.complete(noNodes, 'label', 'type')

generates a complete graph (all nodes connected to all other nodes)

apoc.generate.simple([degrees], 'label', 'type')

generates a graph with the given degree distribution

Example

CALL apoc.generate.ba(1000, 2, 'TestLabel', 'TEST_REL_TYPE')
CALL apoc.generate.ws(1000, null, null, null)
CALL apoc.generate.simple([2,2,2,2], null, null)

Monitoring

(thanks @ikwattro)

apoc.monitor.ids

node and relationships-ids in total and in use

apoc.monitor.kernel

store information such as kernel version, start time, read-only, database-name, store-log-version etc.

apoc.monitor.store

store size information for the different types of stores

apoc.monitor.tx

number of transactions total,opened,committed,concurrent,rolled-back,last-tx-id

apoc.monitor.locks(minWaitTime long)

db locking information such as avertedDeadLocks, lockCount, contendedLockCount and contendedLocks etc. (enterprise)

Cypher Execution

CALL apoc.cypher.runFile(file or url) yield row, result

runs each statement in the file, all semicolon separated - currently no schema operations

CALL apoc.cypher.mapParallel(fragment, params, list-to-parallelize) yield value

executes fragment in parallel batches with the list segments being assigned to _

Triggers

Enable apoc.trigger.enabled=true in $NEO4J_HOME/config/neo4j.conf first.

Helper Functions

apoc.trigger.nodesByLabel({assignedLabels},'Label')

function to filter labelEntries by label, to be used within a trigger statement with {assignedLabels} and {removedLabels} {phase:'before/after/rollback'} returns previous and new trigger information

apoc.trigger.propertiesByKey({assignedNodeProperties},'key')

function to filter propertyEntries by property-key, to be used within a trigger statement with {assignedNode/RelationshipProperties} and {removedNode/RelationshipProperties}. Returns [{old,new,key,node,relationship}]

Helpers

Static Value Storage

apoc.static.get(name)

returns statically stored value from config (apoc.static.<key>) or server lifetime storage

apoc.static.getAll(prefix)

returns statically stored values from config (apoc.static.<prefix>) or server lifetime storage

apoc.static.set(name, value)

stores value under key for server livetime storage, returns previously stored or configured value

Lookup Functions

CALL apoc.get.nodes(node

id

[ids]) yield node

quickly returns all nodes with these id’s

CALL apoc.get.rels(rels

id

[ids]) yield rel

quickly returns all relationships with these id’s

Utilities

apoc.util.sha1([values])

computes the sha1 of the concatenation of all string values of the list

apoc.util.md5([values])

computes the md5 of the concatenation of all string values of the list

apoc.util.sleep({duration})

sleeps for <duration> millis, transaction termination is honored

apoc.util.validate(predicate, message,[params])

raises exception if prediate evaluates to true

Config

apoc.config.list

Lists the Neo4j configuration as key,value table

apoc.config.map

Lists the Neo4j configuration as map

Time to Live (TTL)

Enable TTL with setting in neo4j.conf : apoc.ttl.enabled=true

There are some convenience procedures to expire nodes.

You can also do it yourself by running

SET n:TTL
SET n.ttl = timestamp() + 3600

Optionally set apoc.ttl.schedule=5 as repeat frequency.

Relationship Filter

Syntax: [<]RELATIONSHIP_TYPE1[>]|[<]RELATIONSHIP_TYPE2[>]|…​

input type direction

LIKES>

LIKES

Table of Contents

OUTGOING

<FOLLOWS

FOLLOWS

Table of Contents

INCOMING

KNOWS

KNOWS

Table of Contents

BOTH

Label Filter

Syntax: [+-/]LABEL1|LABEL2|…​

input label result

+Friend

Friend

Table of Contents

include label (whitelist)

-Foe

Foe

Table of Contents

exclude label (blacklist)

/Friend

Friend

Table of Contents

stop traversal after reaching a friend (but include him)

Uniqueness

Uniqueness of nodes and relationships guides the expansion and the returned results. Uniqueness is only configurable using expandConfig().

subgraphNodes(), subgraphAll(), and spanningTree() all use 'NODE_GLOBAL' uniqueness.

value description

RELATIONSHIP_PATH

Table of Contents

For each returned node there’s a (relationship wise) unique path from the start node to it. This is Cypher’s default expansion mode.

NODE_GLOBAL

Table of Contents

A node cannot be traversed more than once. This is what the legacy traversal framework does.

NODE_LEVEL

Table of Contents

Entities on the same level are guaranteed to be unique.

NODE_PATH

Table of Contents

For each returned node there’s a unique path from the start node to it.

NODE_RECENT

Table of Contents

This is like NODE_GLOBAL, but only guarantees uniqueness among the most recent visited nodes, with a configurable count. Traversing a huge graph is quite memory intensive in that it keeps track of all the nodes it has visited. For huge graphs a traverser can hog all the memory in the JVM, causing OutOfMemoryError. Together with this Uniqueness you can supply a count, which is the number of most recent visited nodes. This can cause a node to be visited more than once, but scales infinitely.

RELATIONSHIP_GLOBAL

Table of Contents

A relationship cannot be traversed more than once, whereas nodes can.

RELATIONSHIP_LEVEL

Table of Contents

Entities on the same level are guaranteed to be unique.

RELATIONSHIP_RECENT

Table of Contents

Same as for NODE_RECENT, but for relationships.

NONE

Table of Contents

No restriction (the user will have to manage it)

User Defined Functions

Introduced in Neo4j 3.1.0-M10

Neo4j 3.1 brings some really neat improvements in Cypher alongside other cool features

If you used or wrote procedures in the past, you most probably came across instances where it felt quite unwieldy to call a procedure just to compute something, convert a value or provide a boolean decision.

For example:

CREATE (v:Value {id:{id}, data:{data}})
WITH v
CALL apoc.date.formatDefault(timestamp(), "ms") YIELD value as created
SET v.created = created

You’d rather write it as a function:

CREATE (v:Value {id:{id}, data:{data}, created: apoc.date.format(timestamp()) })

Now in 3.1 that’s possible, and you can also leave off the "ms" and use a single function name, because the unit and format parameters have a default value.

Functions are more limited than procedures: they can’t execute writes or schema operations and are expected to return a single value, not a stream of values. But this makes it also easier to write and use them.

By having information about their types, the Cypher Compiler can also check for applicability.

The signature of the procedure above changed from:

@Procedure("apoc.date.format")
public Stream<StringResult> formatDefault(@Name("time") long time, @Name("unit") String unit) {
   return Stream.of(format(time, unit, DEFAULT_FORMAT));
}

to the much simpler function signature (ignoring the parameter name and value annotations):

@UserFunction("apoc.date.format")
public String format(@Name("time") long time,
                     @Name(value="unit", defaultValue="ms") String unit,
                     @Name(value="format", defaultValue=DEFAULT_FORMAT) String format) {
   return getFormatter().format(time, unit, format);
}

This can then be called in the manner outlined above.

In our APOC procedure library we already converted about 50 procedures into functions from the following areas:

package # of functions example function
Table of Contents

date & time conversion

Table of Contents

3

apoc.date.parse("time",["unit"],["format"])

Table of Contents

number conversion

Table of Contents
Table of Contents

collection and map functions

Table of Contents

JSON conversion

Table of Contents

string functions

Table of Contents

hash functions

Table of Contents

2

apoc.util.md5(value)

You can list user defined functions with SHOW FUNCTIONS

show_functions

Text and Lookup Indexes

Index Queries

Procedures to add to and query manual indexes

Note
Please note that there are (case-sensitive) automatic schema indexes, for equality, non-equality, existence, range queries, starts with, ends-with and contains!

apoc.index.addAllNodes('index-name',{label1:['prop1',…​],…​})

add all nodes to this full text index with the given fields, additionally populates a 'search' index field with all of them in one place

apoc.index.addNode(node,['prop1',…​])

add node to an index for each label it has

apoc.index.addNodeByLabel('Label',node,['prop1',…​])

add node to an index for the given label

apoc.index.addNodeByName('name',node,['prop1',…​])

add node to an index for the given name

apoc.index.addRelationship(rel,['prop1',…​])

add relationship to an index for its type

apoc.index.addRelationshipByName('name',rel,['prop1',…​])

add relationship to an index for the given name

apoc.index.removeNodeByName('name',node) remove node from an index for the given name

apoc.index.removeRelationshipByName('name',rel) remove relationship from an index for the given name

apoc.index.nodes with score

apoc.index.search('index-name', 'query') YIELD node, weight

search for the first 100 nodes in the given full text index matching the given lucene query returned by relevance

apoc.index.nodes('Label','prop:value*') YIELD node, weight

lucene query on node index with the given label name

apoc.index.relationships('TYPE','prop:value*') YIELD rel, weight

lucene query on relationship index with the given type name

apoc.index.between(node1,'TYPE',node2,'prop:value*') YIELD rel, weight

lucene query on relationship index with the given type name bound by either or both sides (each node parameter can be null)

apoc.index.out(node,'TYPE','prop:value*') YIELD node, weight

lucene query on relationship index with the given type name for outgoing relationship of the given node, returns end-nodes

apoc.index.in(node,'TYPE','prop:value*') YIELD node, weight

lucene query on relationship index with the given type name for incoming relationship of the given node, returns start-nodes

Index Management

CALL apoc.index.list() YIELD type,name,config

lists all manual indexes

CALL apoc.index.remove('name') YIELD type,name,config

removes manual indexes

CALL apoc.index.forNodes('name',{config}) YIELD type,name,config

gets or creates manual node index

CALL apoc.index.forRelationships('name',{config}) YIELD type,name,config

gets or creates manual relationship index

Add node to index example
match (p:Person) call apoc.index.addNode(p,["name","age"]) RETURN count(*);
// 129s for 1M People
call apoc.index.nodes('Person','name:name100*') YIELD node, weight return * limit 2

Manual Indexes

Data Used

The below examples use flight data.

Here is a sample subset of the data that can be load to try the procedures:

CREATE (slc:Airport {abbr:'SLC', id:14869, name:'SALT LAKE CITY INTERNATIONAL'})
CREATE (oak:Airport {abbr:'OAK', id:13796, name:'METROPOLITAN OAKLAND INTERNATIONAL'})
CREATE (bur:Airport {abbr:'BUR', id:10800, name:'BOB HOPE'})
CREATE (f2:Flight {flight_num:6147, day:2, month:1, weekday:6, year:2016})
CREATE (f9:Flight {flight_num:6147, day:9, month:1, weekday:6, year:2016})
CREATE (f16:Flight {flight_num:6147, day:16, month:1, weekday:6, year:2016})
CREATE (f23:Flight {flight_num:6147, day:23, month:1, weekday:6, year:2016})
CREATE (f30:Flight {flight_num:6147, day:30, month:1, weekday:6, year:2016})
CREATE (f2)-[:DESTINATION {arr_delay:-13, taxi_time:9}]->(oak)
CREATE (f9)-[:DESTINATION {arr_delay:-8, taxi_time:4}]->(bur)
CREATE (f16)-[:DESTINATION {arr_delay:-30, taxi_time:4}]->(slc)
CREATE (f23)-[:DESTINATION {arr_delay:-21, taxi_time:3}]->(slc)
CREATE (f30)-[:DESTINATION]->(slc)

Using Manual Index on Node Properties

In order to create manual index on a node property, you call apoc.index.addNode with the node, providing the properties to be indexed.

MATCH (a:Airport)
CALL apoc.index.addNode(a,['name'])
RETURN count(*)

The statement will create the node index with the same name as the Label name(s) of the node in this case Airport and add the node by their properties to the index.

Once this has been added check if the node index exists using apoc.index.list.

CALL apoc.index.list()

Usually apoc.index.addNode would be used as part of node-creation, e.g. during LOAD CSV. There is also apoc.index.addNodes for adding a list of multiple nodes at once.

Once the node index is created we can start using it.

Here are some examples:

The apoc.index.nodes finds nodes in a manual index using the given lucene query.

Note
That makes only sense if you combine multiple properties in one lookup or use case insensitive or fuzzy matching full-text queries. In all other cases the built in schema indexes should be used.
CALL apoc.index.nodes('Airport','name:inter*') YIELD node AS airport, weight
RETURN airport.name, weight
LIMIT 10
Note
Apoc index queries not only return nodes and relationships but also a weight, which is the score returned from the underlying Lucene index. The results are also sorted by that score. That’s especially helpful for partial and fuzzy text searches.

To remove the node index Airport created, use:

CALL apoc.index.remove('Airport')

Using Manual Index on Relationship Properties

The procedure apoc.index.addRelationship is used to create a manual index on relationship properties.

As there are no schema indexes for relationships, these manual indexes can be quite useful.

MATCH (:Flight)-[r:DESTINATION]->(:Airport)
CALL apoc.index.addRelationship(r,['taxi_time'])
RETURN count(*)

The statement will create the relationship index with the same name as relationship-type, in this case DESTINATION and add the relationship by its properties to the index.

Using apoc.index.relationships, we can find the relationship of type DESTINATION with the property taxi_time of 11 minutes. We can chose to also return the start and end-node.

CALL apoc.index.relationships('DESTINATION','taxi_time:11') YIELD rel, start AS flight, end AS airport
RETURN flight_num.flight_num, airport.name;
Note
Manual relationship indexed do not only store the relationship by its properties but also the start- and end-node.

That’s why we can use that information to subselect relationships not only by property but also by those nodes, which is quite powerful.

With apoc.index.in we can pin the node with incoming relationships (end-node) to get the start nodes for all the DESTINATION relationships. For instance to find all flights arriving in 'SALT LAKE CITY INTERNATIONAL' with a taxi_time of 7 minutes we’d use:

MATCH (a:Airport {name:'SALT LAKE CITY INTERNATIONAL'})
CALL apoc.index.in(a,'DESTINATION','taxi_time:7') YIELD node AS flight
RETURN flight

The opposite is apoc.index.out, which takes and binds end-nodes and returns start-nodes of relationships.

Really useful to quickly find a subset of relationships between nodes with many relationships (tens of thousands to millions) is apoc.index.between. Here you bind both the start and end-node and provide (or not) properties of the relationships.

MATCH (f:Flight {flight_num:6147})
MATCH (a:Airport {name:'SALT LAKE CITY INTERNATIONAL'})
CALL apoc.index.between(f,'DESTINATION',a,'taxi_time:7') YIELD rel, weight
RETURN *

To remove the relationship index DESTINATION that was created, use.

CALL apoc.index.remove('DESTINATION')

Indexes are used for finding nodes in the graph that further operations can then continue from. Just like in a book where you look at the index to find a section that interest you, and then start reading from there. A full text index allows you to find occurrences of individual words or phrases across all attributes.

In order to use the full text search feature, we have to first index our data by specifying all the attributes we want to index. Here we create a full text index called “locations” (we will use this name when searching in the index) with our data.

Optionally you can enable tracking changes to the graph on a per index level. To do so, you need to take two actions:

  1. set apoc.autoUpdate.enabled=true in your neo4j.conf. With that setting a TransactionEventHandler is registered upon startup of your graph database that reflects property changes to the respective fulltext index.

  2. By default index tracking is done synchronously within the same transaction. Optionally this can be done asynchronous by setting apoc.autoUpdate.async=true.

  3. indexing is started with:

CALL apoc.index.addAllNodes('locations',{
  Company: ["name", "description"],
  Person:  ["name","address"],
  Address: ["address"]})

Creating the index will take a little while since the procedure has to read through the entire database to create the index.

We can now use this index to search for nodes in the database. The most simple case would be to search across all data for a particular word.

It does not matter which property that word exists in, any node that has that word in any of its indexed properties will be found.

If you use a name in the call, all occurrences will be found (but limited to 100 results).

CALL apoc.index.search("locations", 'name')

We can further restrict our search to only searching in a particular attribute. In order to search for a Person with an address in France, we use the following.

CALL apoc.index.search("locations", "Person.address:France")

Now we can search for nodes with a specific property value, and then explore their neighbourhoods visually.

But integrating it with an graph query is so much more powerful.

We could for instance search for addresses in the database that contain the word "Paris", and then find all companies registered at those addresses:

CALL apoc.index.search("locations", "Address.address:Paris~") YIELD node AS addr
MATCH (addr)<-[:HAS_ADDRESS]-(company:Company)
RETURN company LIMIT 50

The tilde (~) instructs the index search procedure to do a fuzzy match, allowing us to find “Paris” even if the spelling is slightly off.

We might notice that there are addresses that contain the word “Paris” that are not in Paris, France. For example there might be a Paris Street somewhere.

We can further specify that we want the text to contain both the word Paris, and the word France:

CALL apoc.index.search("locations", "+Address.address:Paris~ +France~")
YIELD node AS addr
MATCH (addr)<-[:HAS_ADDRESS]-(company:Company)
RETURN company LIMIT 50

Complex Searches

Things start to get interesting when we look at how the different entities in Paris are connected to one another. We can do that by finding all the entities with addresses in Paris, then creating all pairs of such entities and finding the shortest path between each such pair:

CALL apoc.index.search("locations", "+Address.address:Paris~ +France~") YIELD node AS addr
MATCH (addr)<-[:HAS_ADDRESS]-(company:Company)
WITH collect(company) AS companies

// create unique pairs
UNWIND companies AS x UNWIND companies AS y
WITH x, y WHERE ID(x) < ID(y)

MATCH path = shortestPath((x)-[*..10]-(y))
RETURN path

For more details on the query syntax used in the second parameter of the search procedure, please see this Lucene query tutorial

Utility Functions

TimeToLive (TTL) - Expire Nodes

Enable cleanup of expired nodes in neo4j.conf with apoc.ttl.enabled=true

30s after startup an index is created:

CREATE INDEX FOR (n:TTL) ON (n.ttl)

At startup a statement is scheduled to run every 60s (or configure in neo4j.conf - apoc.ttl.schedule=120)

MATCH (t:TTL) where t.ttl < timestamp() WITH t LIMIT 1000 DETACH DELETE t

The ttl property holds the time when the node is expired in milliseconds since epoch.

You can expire your nodes by setting the :TTL label and the ttl property:

MATCH (n:Foo) WHERE n.bar SET n:TTL, n.ttl = timestamp() + 10000;

Graph Algorithms

Relationship Filter

Syntax: [<]RELATIONSHIP_TYPE1[>]|[<]RELATIONSHIP_TYPE2[>]|…​

input type direction

LIKES>

LIKES

Table of Contents

OUTGOING

<FOLLOWS

FOLLOWS

Table of Contents

INCOMING

KNOWS

KNOWS

Table of Contents

BOTH

Label Filter

Syntax: [+-/]LABEL1|LABEL2|…​

input label result

+Friend

Friend

Table of Contents

include label (whitelist)

-Foe

Foe

Table of Contents

exclude label (blacklist)

/Friend

Friend

Table of Contents

stop traversal after reaching a friend (but include him)

Centrality Algorithms

Setup

Let’s create some test data to run the Centrality algorithms on.

// create 100 nodes
FOREACH (id IN range(0,1000) | CREATE (:Node {id:id}))

// over the cross product (1M) create 100.000 relationships
MATCH (n1:Node),(n2:Node) WITH n1,n2 LIMIT 1000000 WHERE rand() < 0.1

CREATE (n1)-[:TYPE]->(n2)

PageRank Algorithm

Setup

Let’s create some test data to run the PageRank algorithm on.

// create 100 nodes
FOREACH (id IN range(0,1000) | CREATE (:Node {id:id}))

// over the cross product (1M) create 100.000 relationships
MATCH (n1:Node),(n2:Node) WITH n1,n2 LIMIT 1000000 WHERE rand() < 0.1

CREATE (n1)-[:TYPE_1]->(n2)

Graph Refactoring

In order not to have to repeatedly geocode the same thing in multiple queries, especially if the database will be used by many people, it might be a good idea to persist the results in the database so that subsequent calls can use the saved results.

Geocode and persist the result

MATCH (a:Place)
WHERE a.address IS NOT NULL AND a.latitude IS NULL
WITH a LIMIT 1000
CALL apoc.spatial.geocodeOnce(a.address) YIELD location
SET a.latitude = location.latitude
SET a.longitude = location.longitude

Note that the above command only geocodes the first 1000 ‘Place’ nodes that have not already been geocoded. This query can be run multiple times until all places are geocoded. Why would we want to do this? Two good reasons:

  • The geocoding service is a public service that can throttle or blacklist sites that hit the service too heavily, so controlling how much we do is useful.

  • The transaction is updating the database, and it is wise not to update the database with too many things in the same transaction, to avoid using up too much memory. This trick will keep the memory usage very low.

Now make use of the results in distance queries

WITH point({latitude: 48.8582532, longitude: 2.294287}) AS eiffel
MATCH (a:Place)
WHERE a.latitude IS NOT NULL AND a.longitude IS NOT NULL
WITH a, point.distance(point(a), eiffel) AS distance
WHERE distance < 5000
RETURN a.name, distance
ORDER BY distance
LIMIT 100

Combining spatial and date-time procedures can allow for more complex queries:

CALL apoc.date.parse('2016-06-01 00:00:00','h') YIELD value AS due_date
WITH due_date,
point({latitude: 48.8582532, longitude: 2.294287}) AS eiffel
MATCH (e:Event)
WHERE e.address IS NOT NULL AND e.datetime IS NOT NULL
CALL apoc.spatial.geocodeOnce(e.address) YIELD location
CALL apoc.date.parse(e.datetime,'h') YIELD value AS hours
WITH e, location,
distance(point(location), eiffel) AS distance,
            (due_date - hours)/24.0 AS days_before_due
WHERE distance < 5000 AND days_before_due < 14 AND hours < due_date
RETURN e.name AS event, e.datetime AS date,
location.description AS description, distance
ORDER BY distance

Load JDBC

Overview: Database Integration

Data Integration is an important topic. Reading data from relational databases to create and augment data models is a very helpful exercise.

With apoc.load.jdbc you can access any database that provides a JDBC driver, and execute queries whose results are turned into streams of rows. Those rows can then be used to update or create graph structures.

CALL apoc.load.jdbc('jdbc:derby:derbyDB','PERSON') YIELD row CREATE (:Person {name:row.name})

load from relational database, either a full table or a sql statement

CALL apoc.load.jdbc('jdbc:derby:derbyDB','SELECT * FROM PERSON WHERE AGE > 18')

load from relational database, either a full table or a sql statement

CALL apoc.load.driver('org.apache.derby.jdbc.EmbeddedDriver')

register JDBC driver of source database

To simplify the JDBC URL syntax and protect credentials, you can configure aliases in conf/neo4j.conf:

apoc.jdbc.myDB.url=jdbc:derby:derbyDB
CALL apoc.load.jdbc('jdbc:derby:derbyDB','PERSON')

becomes

CALL apoc.load.jdbc('myDB','PERSON')

The 3rd value in the apoc.jdbc.<alias>.url= effectively defines an alias to be used in apoc.load.jdbc('<alias>',…​.

MySQL Example

Northwind is a common example set for relational databases, which is also covered in our import guides, e.g. :play northwind graph in the Neo4j browser.

MySQL Northwind Data
select count(*) from products;
+----------+
| count(*) |
+----------+
|       77 |
+----------+
1 row in set (0,00 sec)
describe products;
+-----------------+---------------+------+-----+---------+----------------+
| Field           | Type          | Null | Key | Default | Extra          |
+-----------------+---------------+------+-----+---------+----------------+
| ProductID       | int(11)       | NO   | PRI | NULL    | auto_increment |
| ProductName     | varchar(40)   | NO   | MUL | NULL    |                |
| SupplierID      | int(11)       | YES  | MUL | NULL    |                |
| CategoryID      | int(11)       | YES  | MUL | NULL    |                |
| QuantityPerUnit | varchar(20)   | YES  |     | NULL    |                |
| UnitPrice       | decimal(10,4) | YES  |     | 0.0000  |                |
| UnitsInStock    | smallint(2)   | YES  |     | 0       |                |
| UnitsOnOrder    | smallint(2)   | YES  |     | 0       |                |
| ReorderLevel    | smallint(2)   | YES  |     | 0       |                |
| Discontinued    | bit(1)        | NO   |     | b'0'    |                |
+-----------------+---------------+------+-----+---------+----------------+
10 rows in set (0,00 sec)

Load JDBC Examples

Load the JDBC driver
cypher CALL apoc.load.driver("com.mysql.jdbc.Driver");
Count rows in products table
with "jdbc:mysql://localhost:3306/northwind?user=root" as url
cypher CALL apoc.load.jdbc(url,"products") YIELD row
RETURN count(*);
+----------+
| count(*) |
+----------+
| 77       |
+----------+
1 row
23 ms
Return row from products table
with "jdbc:mysql://localhost:3306/northwind?user=root" as url
cypher CALL apoc.load.jdbc(url,"products") YIELD row
RETURN row limit 1;
+--------------------------------------------------------------------------------+
| row                                                                            |
+--------------------------------------------------------------------------------+
| {UnitPrice -> 18.0000, UnitsOnOrder -> 0, CategoryID -> 1, UnitsInStock -> 39} |
+--------------------------------------------------------------------------------+
1 row
10 ms
apoc load jdbc

Load data in transactional batches

You can load data from jdbc and create/update the graph using the query results in batches (and in parallel).

CALL apoc.periodic.iterate('
call apoc.load.jdbc("jdbc:mysql://localhost:3306/northwind?user=root","company")',
'CREATE (p:Person) SET p += value', {batchSize:10000, parallel:true})
RETURN batches, total

Cassandra Example

Setup Song database as initial dataset

curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/playlist.cql
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/artists.csv
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/songs.csv
$CASSANDRA_HOME/bin/cassandra
$CASSANDRA_HOME/bin/cqlsh -f playlist.cql

Download the Cassandra JDBC Wrapper, and put it into your $NEO4J_HOME/plugins directory. Add this config option to $NEO4J_HOME/conf/neo4j.conf to make it easier to interact with the cassandra instance.

Add to conf/neo4j.conf
apoc.jdbc.cassandra_songs.url=jdbc:cassandra://localhost:9042/playlist

Restart the server.

Now you can inspect the data in Cassandra with.

CALL apoc.load.jdbc('cassandra_songs','artists_by_first_letter') yield row
RETURN count(*);
╒════════╕
│count(*)│
╞════════╡
│3605    │
└────────┘
CALL apoc.load.jdbc('cassandra_songs','artists_by_first_letter') yield row
RETURN row LIMIT 5;
CALL apoc.load.jdbc('cassandra_songs','artists_by_first_letter') yield row
RETURN row.first_letter, row.artist LIMIT 5;
╒════════════════╤═══════════════════════════════╕
│row.first_letter│row.artist                     │
╞════════════════╪═══════════════════════════════╡
│C               │C.W. Stoneking                 │
├────────────────┼───────────────────────────────┤
│C               │CH2K                           │
├────────────────┼───────────────────────────────┤
│C               │CHARLIE HUNTER WITH LEON PARKER│
├────────────────┼───────────────────────────────┤
│C               │Calvin Harris                  │
├────────────────┼───────────────────────────────┤
│C               │Camané                         │
└────────────────┴───────────────────────────────┘

Let’s create some graph data, we have a look at the track_by_artist table, which contains about 60k records.

CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row RETURN count(*);
CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row
RETURN row LIMIT 5;
CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row
RETURN row.track_id, row.track_length_in_seconds, row.track, row.music_file, row.genre, row.artist, row.starred LIMIT 2;
╒════════════════════════════════════╤══════╤════════════════╤══════════════════╤═════════╤════════════════════════════╤═══════════╕
│row.track_id                        │length│row.track       │row.music_file    │row.genre│row.artist                  │row.starred│
╞════════════════════════════════════╪══════╪════════════════╪══════════════════╪═════════╪════════════════════════════╪═══════════╡
│c0693b1e-0eaa-4e81-b23f-b083db303842│219   │1913 Massacre   │TRYKHMD128F934154C│folk     │Woody Guthrie & Jack Elliott│false      │
├────────────────────────────────────┼──────┼────────────────┼──────────────────┼─────────┼────────────────────────────┼───────────┤
│7d114937-0bc7-41c7-8e0c-94b5654ac77f│178   │Alabammy Bound  │TRMQLPV128F934152B│folk     │Woody Guthrie & Jack Elliott│false      │
└────────────────────────────────────┴──────┴────────────────┴──────────────────┴─────────┴────────────────────────────┴───────────┘

Let’s create some indexes and constraints, note that other indexes and constraints will be dropped by this.

CALL apoc.schema.assert(
  {Track:['title','length']},
  {Artist:['name'],Track:['id'],Genre:['name']});
╒════════════╤═══════╤══════╤═══════╕
│label       │key    │unique│action │
╞════════════╪═══════╪══════╪═══════╡
│Track       │title  │false │CREATED│
├────────────┼───────┼──────┼───────┤
│Track       │length │false │CREATED│
├────────────┼───────┼──────┼───────┤
│Artist      │name   │true  │CREATED│
├────────────┼───────┼──────┼───────┤
│Genre       │name   │true  │CREATED│
├────────────┼───────┼──────┼───────┤
│Track       │id     │true  │CREATED│
└────────────┴───────┴──────┴───────┘
CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row
MERGE (a:Artist {name:row.artist})
MERGE (g:Genre {name:row.genre})
CREATE (t:Track {id:toString(row.track_id), title:row.track, length:row.track_length_in_seconds})
CREATE (a)-[:PERFORMED]->(t)
CREATE (t)-[:GENRE]->(g);
Added 63213 labels, created 63213 nodes, set 182413 properties, created 119200 relationships, statement executed in 40076 ms.

LOAD JDBC - Resources

To use other JDBC drivers use these download links and JDBC URL. Put the JDBC driver into the $NEO4J_HOME/plugins directory and configure the JDBC-URL in $NEO4J_HOME/conf/neo4j.conf with apoc.jdbc.<alias>.url=<jdbc-url>

Database JDBC-URL  Driver Source
Table of Contents

MySQL

jdbc:mysql://<hostname>:<port/3306>/<database>?user=<user>&password=<pass>

Table of Contents
Table of Contents

Postgres

jdbc:postgresql://<hostname>/<database>?user=<user>&password=<pass>

Table of Contents
Table of Contents

Oracle

jdbc:oracle:thin:<user>/<pass>@<host>:<port>/<service_name>

Table of Contents
Table of Contents

MS SQLServer

jdbc:sqlserver://;servername=<servername>;databaseName=<database>;user=<user>;password=<pass>

Table of Contents
Table of Contents

IBM DB2

jdbc:db2://<host>:<port/5021>/<database>:user=<user>;password=<pass>;

Table of Contents
Table of Contents

Derby

jdbc:derby:derbyDB

Table of Contents

Included in JDK6-8

Table of Contents

Cassandra

jdbc:cassandra://<host>:<port/9042>/<database>

Table of Contents

There are a number of blog posts / examples that details usage of apoc.load.jdbc

Streaming Data to Gephi

apoc.gephi.add(url-or-key, workspace, data)

streams provided data to Gephi

Notes

Gephi has a streaming plugin, that can provide and accept JSON-graph-data in a streaming fashion.

Make sure to install the plugin firsrt and activate it for your workspace (there is a new "Streaming"-tab besides "Layout"), right-click "Master"→"start" to start the server.

You can provide your workspace name (you might want to rename it before you start thes streaming), otherwise it defaults to workspace0

You can also configure it in conf/neo4j.conf via apoc.gephi.url=url or apoc.gephi.<key>.url=url

Example

match path = (:Person)-[:ACTED_IN]->(:Movie)
WITH path LIMIT 1000
with collect(path) as paths
call apoc.gephi.add(null,'workspace0', paths) yield nodes, relationships, time
return nodes, relationships, time
apoc gephi

Format

We send all nodes and relationships of the passed in data convert into individual Gephi-Streaming JSON fragements, separated by \r\n.

{"an":{"123":{"TYPE":"Person:Actor","label":"Tom Hanks",                           x:333,y:222,r:0.1,g:0.3,b:0.5}}}\r\n
{"an":{"345":{"TYPE":"Movie","label":"Forrest Gump",                               x:234,y:122,r:0.2,g:0.2,b:0.7}}}\r\n
{"ae":{"3344":{"TYPE":"ACTED_IN","label":"Tom Hanks",source:"123",target:"345","directed":true,r:0.1,g:0.3,b:0.5}}}

Specifics Details

Gephi doesn’t render the graph data unless you also provide x,y coordinates in the payload, so we just send random ones within a 1000x1000 grid.

We also generate colors per label combination and relationship-type, both of which are also transferred as TYPE property.

ElasticSearch Integration

apoc.es.stats(host-url-Key)

elastic search statistics

apoc.es.get(host-or-port,index-or-null,type-or-null,id-or-null,query-or-null,payload-or-null) yield value

perform a GET operation

apoc.es.query(host-or-port,index-or-null,type-or-null,query-or-null,payload-or-null) yield value

perform a SEARCH operation

apoc.es.getRaw(host-or-port,path,payload-or-null) yield value

perform a raw GET operation

apoc.es.postRaw(host-or-port,path,payload-or-null) yield value

perform a raw POST operation

apoc.es.post(host-or-port,index-or-null,type-or-null,query-or-null,payload-or-null) yield value

perform a POST operation

apoc.es.put(host-or-port,index-or-null,type-or-null,query-or-null,payload-or-null) yield value

perform a PUT operation

Example

call apoc.es.post("localhost","tweets","users","1",null,{name:"Chris"})
call apoc.es.get("localhost","tweets","users","1",null,null)
call apoc.es.stats("localhost")
qHAj9ma

General Structure and Parameters

call apoc.es.post(host-or-port,index-or-null,type-or-null,id-or-null,query-or-null,payload-or-null) yield value

// GET/PUT/POST url/index/type/id?query -d payload
host or port parameter

The parameter can be a direct host or url, or an entry to be lookup up in neo4j.conf

  • host

  • host:port

  • http://host:port

  • lookup via key to apoc.es.<key>.url

  • lookup via key apoc.es.<key>.host

  • lookup apoc.es.url

  • lookup apoc.es.host

index parameter

Main ES index, will be sent directly, if null then "_all" multiple indexes can be separated by comma in the string.

type parameter

Document type, will be sent directly, if null then "_all" multiple types can be separated by comma in the string.

id parameter

Document id, will be left off when null.

query parameter

Query can be a map which is turned into a query string, a direct string or null then it is left off.

payload parameter

Payload can be a map which will be turned into a json payload or a string which will be sent directly or null.

Results

Results are stream of map in value.

Load XML

Load XML Introduction

Many existing (enterprise) applications, endpoints and files use XML as data exchange format.

To make these datastructures available to Cypher, you can use apoc.load.xml. It takes a file or http URL and parses the XML into a map datastructure.

See the following usage-examples for the procedures.

Example File

"How do you access XML doc attributes in children fields ?"

(Thanks Nicolas Rouyer)

For example, if my XML file is the example book.xml provided by Microsoft.

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
...

We have the file here, on GitHub.

Load XML and Introspect

Let’s just load it and see what it looks like. It’s returned as value map with nested _type and _children fields, per group of elements. Attributes are turned into map-entries. And each element into their own little map with _type, attributes and _children if applicable.

call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/master/src/test/resources/books.xml")
{_type: catalog, _children: [
  {_type: book, id: bk101, _children: [
    {_type: author, _text: Gambardella, Matthew},
    {_type: title, _text: XML Developer's Guide},
    {_type: genre, _text: Computer},
    {_type: price, _text: 44.95},
    {_type: publish_date, _text: 2000-10-01},
    {_type: description, _text: An in-depth look at creating applications ....
For each book, how do I access book id ?

You can access attributes per element directly.

call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/master/src/test/resources/books.xml") yield value as catalog
UNWIND catalog._children as book
RETURN book.id
╒═══════╕
│book.id│
╞═══════╡
│bk101  │
├───────┤
│bk102  │
For each book, how do I access book author and title ?
Filter into collection

You have to filter over the sub-elements in the _childrens array in this case.

call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/master/src/test/resources/books.xml") yield value as catalog
UNWIND catalog._children as book
RETURN book.id, [attr IN book._children WHERE attr._type IN ['author','title'] | [attr._type, attr._text]] as pairs
╒═══════╤════════════════════════════════════════════════════════════════════════╕
│book.id│pairs                                                                   │
╞═══════╪════════════════════════════════════════════════════════════════════════╡
│bk101  │[[author, Gambardella, Matthew], [title, XML Developer's Guide]]        │
├───────┼────────────────────────────────────────────────────────────────────────┤
│bk102  │[[author, Ralls, Kim], [title, Midnight Rain]]                          │
How do I return collection elements?

This is not too nice, we could also just have returned the values and then grabbed them out of the list, but that relies on element-order.

call apoc.load.xml("https://raw.githubusercontent.com/neo4j-contrib/neo4j-apoc-procedures/master/src/test/resources/books.xml") yield value as catalog
UNWIND catalog._children as book
WITH book.id as id, [attr IN book._children WHERE attr._type IN ['author','title'] | attr._text] as pairs
RETURN id, pairs[0] as author, pairs[1] as title
╒═════╤════════════════════╤══════════════════════════════╕
│id   │author              │title                         │
╞═════╪════════════════════╪══════════════════════════════╡
│bk101│Gambardella, Matthew│XML Developer's Guide         │
├─────┼────────────────────┼──────────────────────────────┤
│bk102│Ralls, Kim          │Midnight Rain                 │

Cypher Operations

Running Cypher fragments

We can use Cypher as safe, graph-aware, partially compiled scripting language within APOC.

Cypher Execution

CALL apoc.cypher.runFile(file or url) yield row, result

runs each statement in the file, all semicolon separated - currently no schema operations

CALL apoc.cypher.mapParallel(fragment, params, list-to-parallelize) yield value

executes fragment in parallel batches with the list segments being assigned to _

Example: Fast Node-Counts by Label

Counts by label are quickly provided by the counts-store, but only if they are the the single thing in the query, like

MATCH (:Person) RETURN count(*);

It also works to combine several with UNION ALL, but not via WITH

Doesn’t work
MATCH (:Person) WITH count(*) as people
MATCH (:Movie) RETURN people, count(*) as movies;
Works
MATCH (:Person) RETURN count(*)
UNION ALL
MATCH (:Movie) RETURN count(*);

But with apoc.cypher.run we can construct the statement and run each of them individually, so it completes in a few ms.

call db.labels() yield label
call apoc.cypher.run("match (:`"+label+"`) return count(*) as count", null) yield value
return label, value.count as count

You can use a similar approach to get the property-keys per label:

CALL db.labels() yield label
call apoc.cypher.run("MATCH (n:`"+label+"`) RETURN keys(n) as keys LIMIT 1",null) yield value
RETURN label, value.keys as keys

Job management and periodic execution

Introduction asynchronous transactional execution

Note
this document is work in progress

Cypher is great for querying graphs and importing and updating graph structures. While during imports you can use PERIODIC COMMIT to control transaction sizes in memory, for other graph refactorings it’s not that easy to commit transactions regularly to free memory for new update state.

Also sometimes you want to schedule execution of Cypher statements to run regularly in the background or asynchronously ("fire & forget").

The apoc.periodic.* procedures provide such capabilities.