In the previous post Combining Neo4J and Hadoop (part I) we described the way we combine Hadoop and Neo4J and how we are getting the data into Neo4J.
In this second part we will take you through the journey we took to implement a distributed way to create a Neo4J database. The idea is to use our Hadoop cluster for creating the underlying file structure of a Neo4J database.
To do this we must first understand this file-structure. Luckily Chris Gioran has done a great job describing this structure in his blog Neo4J internal file storage
The description was done for version 1.6 but largely still matches the 1.8 file-structure.
First I’ll start with a small recap of the file-structure.

The Neo4J Filestructure
Most important for the database are the files inside the graph.db directory
neostore
neostore.id
neostore.nodestore.db
neostore.nodestore.db.id
neostore.relationshipstore.db
neostore.relationshipstore.db.id
neostore.relationshiptypestore.db
neostore.relationshiptypestore.db.id
neostore.relationshiptypestore.db.names
neostore.relationshiptypestore.db.names.id
neostore.propertystore.db
neostore.propertystore.db.id
neostore.propertystore.db.arrays
neostore.propertystore.db.arrays.id
neostore.propertystore.db.strings
neostore.propertystore.db.strings.id
neostore.propertystore.db.index
neostore.propertystore.db.index.id
neostore.propertystore.db.index.keys
neostore.propertystore.db.index.keys.id
First file is the neostore file which encodes a creationtime, randomnumber, version, lastcommitted transaction id, the store version and the next propertyid
every value is encoded as a long together with a one byte inUse marker. This gives a file of 6 * 9 bytes + 15 bytes for the filetype and version string "NeoStore v0.A.0"
Every file is accompanied with a .id file. This is where the last free id is stored. For the neostore this will be 6 because we already use 5 records of 9 bytes.
The nodes:
Nodes are stored in the neostore.nodestore.db file. The nodes are encoded in records of 9 bytes total size.
1 byte use flag
4 byte id of first relationship
4 byte id of first property
The position in the file makes the nodeId so with these 9 bytes we have all information we need for a node.
At the end of the file the type and version are encoded ("NodeStore v0.A.0") in 16 bytes.
This looks something like this:
01 ff ff ff ff ff ff ff ff // root node, no relationships, no properties 01 00 00 00 00 00 00 00 01 // node 1, first relationship 0, first property 1 01 00 00 00 02 00 00 00 04 // node 2, first relationship 2, first property 4 4e 6f 64 65 53 74 6f 72 65 20 76 30 2e 41 2e 30 // NodeStore v0.A.0
The relationships:
relationships are stored in the neostore.relationshipstore.db file. The relationships are encoded in 33 bytes
1 byte use flag
4 byte from node id
4 byte to node id
4 byte relationship type
4 byte from node previous rel id
4 byte from node next rel id
4 byte to node previous rel id
4 byte to node next rel id
4 byte id of first property
The position in the file again makes the id and at the end of the file the type and version are encoded ("RelationshipStore v0.A.0") in 24 bytes
Apart from the nodeids, the previous and next ralationships and the first property id there is a pointer to the relationshiptype which is stored in the neostore.relationshiptypestore.db
So this makes the relationshipfile look like this:
01 00 00 00 02 00 00 00 01 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff // relationship 1, from node 2, to node 1, type 0, no prev, no next, 01 00 00 00 04 00 00 00 03 00 00 00 01 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 06 01 00 00 00 06 00 00 00 05 00 00 00 01 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00 00 0a 52 65 6c 61 74 69 6f 6e 73 68 69 70 53 74 6f 72 65 20 76 30 2e 41 2e 30
The neostore.relationshiptypestore.db file is just a simple 5 byte encoding
1 byte use flag
4 byte id of typename
at the end of the file the type and version are encoded ("RelationshipTypeStore v0.A.0") in 28 bytes
The typname id is the record in the neostore.relationshiptypestore.db.names.
The rest of the files are perfectly explained in the formentioned blog from Chris Gioran.
In the previous blog we ended at phase III, so we’ll continue from there
Phase IV:
So how do we start using this information to produce these file in a distributed fashion. In this blog i will concentrate on the nodes, relationships and properties because the other file types only contain a small amount of information.
Since the Id of the nodes and edges is based on the position in the file we need a sequence number for the nodes and the edges, luckily my colleague Friso already made such an effort (see Monotonically increasing row ids)
As mentioned in the previous blog, we start out with two input files. One for nodes and one for the edges between the nodes.
So the first 2 jobs are easy, just take the node data and add a rownumber to it, same for the edges data
This will give us the following data (based on what we started with in the previous blog)
The nodes table/file looks something like this:
RowNum NodeId Property1 Property2 PropertyN 0    AAA    nameOfA   amountOfA someAThing 1      BBB    nameOfB   amountOfB someBThing 2      CCC    nameOfC   amountOfC someCThing 3      DDD    nameOfD   amountOfD someDThing
The edges table/file looks something like this:
RowNum fromNodeId ToNodeId EdgeProperty1 EdgePropertyN 0      AAA    BBB      someDate1 someNumber1 1    AAA    DDD      someDate2     someNumber2 2    BBB    DDD      someDate3     someNumber3 3    CCC    BBB      someDate4     someNumber4 4    DDD    BBB      someDate5     someNumber5 5    DDD    CCC      someDate6     someNumber6
Start with the NODES
While investigating the file formats we learned that nodes have a pointer to the first edge and a pointer to the first property
if we know the amount of properties a node has the pointer to the first property is easy right (just multiply the nodeid with the amount of node properties and we are good to go)
Next we need to perform the following steps:
1 – join nodes and edges on id and idfrom (properties can be ignored)
2 – join nodes and edges on id and idto (properties can be ignored)
3 – union those two joins
4 – sort ascending on nodenum, descending on relnum
5 – grab only the first of every id
6 – sort by rownum
7 – create byterecords per node
Step 1 will result in:
nodeNum nodeId relNum fromNodeId ToNodeId fromNodeNum 0       AAA    0      AAA        BBB    0 0 AAA    1    AAA        DDD    0 1 BBB    2    BBB        DDD      1 2 CCC    3    CCC    BBB    2 3 DDD    4    DDD    BBB    3 3 DDD    5    DDD        CCC      3
Step 2 will result in:
nodeNum nodeId relNum fromNodeId ToNodeId toNodeNum 1       BBB    0      AAA        BBB      1 3       DDD    1      AAA        DDD    3 3       DDD    2    BBB        DDD    3 1       BBB    3    CCC        BBB    1 1       BBB    4    DDD        BBB    1 2       CCC    5    DDD        CCC      2
Step 3 will result in:
nodeNum nodeId relNum fromNodeId ToNodeId fromNodeNum toNodeNum 0       AAA    0      AAA        BBB      0 1 0       AAA    1      AAA        DDD      0           3 1       BBB    2      BBB        DDD      1           3 2       CCC    3      CCC        BBB      2           1 3       DDD    4      DDD        BBB      3           1 3       DDD    5      DDD        CCC      3           2 1       BBB    0      AAA        BBB      0           1 3       DDD    1      AAA        DDD      0           3 3       DDD    2      BBB        DDD      1           3 1       BBB    3      CCC        BBB      2           1 1       BBB    4      DDD        BBB      3           1 2       CCC    5      DDD        CCC      3           2
Step 4 will result in:
nodeNum nodeId relNum fromNodeId ToNodeId fromNodeNum toNodeNum 0       AAA    1      AAA        DDD      0           3 0       AAA    0      AAA        BBB      0           1 1       BBB    4    DDD        BBB      3           1 1       BBB    3      CCC        BBB      2           1 1       BBB    2      BBB        DDD      1           3 1       BBB    0      AAA        BBB      0           1 2 CCC    5      DDD        CCC      3           2 2       CCC    3      CCC        BBB      2           1 3       DDD    5      DDD        CCC      3           2 3       DDD    4      DDD        BBB      3           1 3       DDD    2      BBB        DDD      1           3 3       DDD    1      AAA        DDD      0           3
Step 5 and 6 will result in:
nodeNum nodeId relNum fromNodeId ToNodeId fromNodeNum toNodeNum 0       AAA    1      AAA DDD 0 3 1       BBB    4 DDD BBB 3 1 2       CCC    5 DDD CCC      3 2 3       DDD    5 DDD CCC 3 2
So now we can output this to the neostore.nodestore.db file as this:
1 1 0 1 4 4 1 5 8 1 5 12
Next we process the EDGES
The first steps are the same as we needed to perform for the nodes
1 – join nodes and edges on id and idfrom (properties can be ignored)
2 – join nodes and edges on id and idto (properties can be ignored)
3 – union those two joins
4 – sort ascending on nodenum, descending on relnum
We will have the data from the Nodes step 4 to start with and we can leave out the nodeIds:
nodeNum relNum fromNodeNum toNodeNum 0       1      0 3 0       0      0 1 1       4      3       1 1 3      2           1 1       2      1           3 1       0      0 1 2 5      3 2 2 3      2           1 3 5      3 2 3       4      3 1 3       2      1 3 3       1      0 3
5 – determine the sequence of the relations per node
nodeNum relNum fromNodeNum toNodeNum next previous 0       1      0           3         0    x 0       0      0           1         x    1 1       4      3           1         3    x 1       3      2           1         2    4 1       2      1           3         0    3 1       0      0           1         x    2 2       5      3           2         3    x 2       3      2           1         x    5 3       5      3 2         4    x 3       4      3 1         2    5 3       2      1 3         1    4 3       1      0 3         x    2
6 – create a self join on relnum, fromNodeNum and toNodeNum to determine the previous/next for the toNode
nodeNum relNum fromNodeNum toNodeNum next previous nodeNum2 relNum2 fromNodeNum2 toNodeNum2 next2 previous2 0       1      0           3         0    x        0        1 0    3    0 x 0       1      0           3 0    x        3        1 0    3    x 2 0       0      0           1 x    1        0        0 0    1    x 1 0       0      0 1         x    1        1        0 0    1    x 2 1       4      3           1         3    x    1        4 3    1    3 x 1       4      3 1         3    x        3        4 3    1    2 5 1 3      2 1         2    4    1        3 2    1    2 4 1 3      2           1         2    4    2        3 2    1    x     5 1       2    1 3         0    3    1        2       1    3    0 3 1 2      1 3         0    3    3        2 1    3    1 4 1       0      0 1 x    2        1        0       0    1    x     2 1       0    0 1 x    2        0        0 0    1    x 1 2       5      3 2 3    x        2        5       3    2    3 x 2       5      3 2 3    x        3    5 3            2    4     x 2       3      2           1         x    5        2    3 2    1    x 5 2       3      2           1         x    5        1        3 2    1    2 4 3 5      3 2         4    x        3        5 3    2    4 x 3 5    3 2 4    x        2        5 3    2    3 x 3       4    3           1 2    5        3        4 3            1    2 5 3 4    3 1         2    5    1        4       3    1    3 x 3 2    1 3         1    4        3        2 1    3    1 4 3       2    1           3 1    4        1    2 1    3    0     3 3       1    0 3 x    2    3    1       0            3          x 2 3       1    0 3         x    2    0        1 0    3    0 x
7 – get out the duplicates on id, from, to and relnum (the selfjoined records)
nodeNum relNum fromNodeNum toNodeNum next previous nodeNum2 relNum2 fromNodeNum2 toNodeNum2 next2 previous2 0       1      0 3 0    x    3    1 0    3    x 2 0       0      0 1 x    1    1    0 0    1    x 2 1       4    3 1 3    x    3    4 3    1    2 5 1 3    2 1 2    4    2        3       2    1    x     5 1       2      1           3 0    3    3    2 1    3    1 4 1 0    0 1 x    2    0    0 0    1    x 1 2 5    3 2 3    x    3        5 3    2    4 x 2       3    2 1 x    5    1    3 2    1          2     4 3 5    3 2 4    x        2        5       3            2          3 x 3       4    3 1 2    5    1    4 3    1    3 x 3 2    1 3 1    4    1    2 1    3    0 3 3       1      0           3         x    2        0        1       0            3          0     x
8 – filter out the duplicates (on relnum, form, to), keeping the ones where nodeNum == fromNodeNum
nodeNum relNum fromNodeNum toNodeNum next previous nodeNum2 relNum2 fromNodeNum2 toNodeNum2 next2 previous2 0       1    0           3         0    x        3        1       0            3          x 2 0 0      0 1 x    1    1    0 0    1    x 2 1 2    1 3 0    3    3    2 1    3    1 4 2       3      2           1 x    5    1    3       2            1    2 4 3 5    3 2 4    x        2    5 3    2    3 x 3 4      3 1 2    5    1    4 3    1          3     x
9 – sort on relnum (prepare for output), removing unused stuff like nodeNum, nodenum2, relnum2, formNodeNum2 and toNodeNum2
relNum fromNodeNum toNodeNum fromnext fromprevious tonext toprevious 0      0 1 x    1    x    2 1      0 3 0    x    x    2 2      1 3 0    3    1    4 3    2 1 x    5    2    4 4    3 1 2        5            3    x 5      3           2 4    x    3    x
10 – create byterecord per rel
So now we only need to do the properties. To keep this post to become too lengthy (I guess it is a bit late for that), I will leave that to an exercise for the reader.
So this combination of jobs runs for about 1 hr but getting the files to the local machine took 6 hrs. What’s happened here.
We have ourselves a database with a size of 136 GB instead of the 80 GB with the batchimporter
Oops we missed the property optimalization somewhere. back to the drawing board
Properties are stored in a record with 4 longs (4 * 8 bytes) and for some properties thats more then enough room to store multiple properties in one record.
We implemented that too and got ourselves a nice 80GB database.
The code for all this can be found on github.
I used this project to get myself more familiar with the Map/Reduce paradigm so I coded all jobs in pure Java map/reduce parts.
I Hope this was of use for you and I surely hope the code is clean enough to understand the parts I didn’t describe in detail. Feel free to contact me if you want to use it and need some more explaining.
—- Update —-
Michael Hunger from neotechnology mailed me a nice picture explaining the file-storage format a bit more. Since a picture says more then the 1605 words I used, I didn’t want to withhold this from you.

store-format-neo4j

Combining Neo4J and Hadoop (part II)

Get in touch with us to learn more about the subject and related solutions

Explore related posts