Setting up Dockerized Neo4j for data analysis

Posted on Tue, 12 September 2017 in development

cover image, the tube

I came across an interesting article on London Tube system analysis with Neo4j, utilizing the Graph algorithms plugin. This is how I set up Neo4j Docker instance for the journey.

Prerequisites

  • Recent macOS, Windows or Linux laptop should all work. At the time of writing I am running macOS Sierra 10.12.6.
  • You have the Docker Community Edition running. At the time of writing I am running Version 17.06.2-ce-mac27 (19124).

Prepare host machine

We are going to store the data to host among other things. To accomplish this, we need to create few directories under our work dir (whichever you like it to be):

mkdir data
mkdir logs
mkdir conf
mkdir plugins

Get the initial config

We will need to add new parameter to neo4j.conf, so first we need to get our hands on the file as it is now. Luckily we can dump it from the container:

docker run --rm \
    --volume=$(pwd)/conf:/conf \
    neo4j:3.2.3 dump-config

… which will “download” the configuration file to conf directory under current host directory.

Download the plugin

In order to use the Graph Algorithm, we need to download it from GitHub under releases. At the time of writing, the file in question is graph-algorithms-algo-3.2.2.1.jar.

Store it in plugins directory.

Since we are using Neo4j version 3.2.x, we need to add a line to the conf. From the repo’s README copy the line dbms.security.procedures.unrestricted=algo.* and add it to the neo4j.conf.

Run the container

Now that all prerequisites are in order, it’s time to run the container. With the options set as below:

  • the container is run as daemon,
  • no authentication is required (we could set/require it by using --env NEO4J_AUTH=neo4j/<password> instead)
  • data will be stored in data,
  • logs will be written to logs,
  • conf will be used from conf,
  • plugins will be included and run from plugins.
docker run -d \
    --publish=7474:7474 \
    --publish=7687:7687 \
    --volume=$(pwd)/data:/data \
    --volume=$(pwd)/logs:/logs \
    --volume=$(pwd)/conf:/conf \
    --volume=$(pwd)/plugins:/plugins \
    --env=NEO4J_AUTH=none \
    --name my_neo4j neo4j:3.2.3

So, with the container running — we can check the status with docker ps -a — it’s time to open a web browser and head to http://localhost:7474/browser/. Should all have gone well, we should see Neo4j’s web GUI.

We can verify that the plugin have loaded correctly by querying

CALL dbms.procedures() YIELD name, description, signature
WHERE name STARTS WITH "algo."
RETURN name, description, signature
ORDER BY name

… which should list the algorithm procedures.

Follow the original article

Now that we have Neo4j up and running — with plugins — it’s time to follow the aforementioned article and analyze the London Tube data.