The road so far….

August 9, 2010

Exploring Neo4J, the nosql graph database

Filed under: java — Tags: — Rahul Sharma @ 8:15 am

After hearing much about Neo4J, the graph database, I started my quest of exploring it. My goals were simple, evaluate it and possibly refactor and use it at one of our graph solutions. Now  after the evaluation I have found out that it is quite  simple and efficient  to use, but there are certain things esp  a shift in design that it asks us to do  in order to use it properly. Also the complete Neo4J library is divided into components and the kernel lib is quite light weight (a few Kbs) and can easily be shipped with the application.

Let us say  we want to implement a use-case where there are persons and a person can be connected to other persons. In order to use Neo4J  we must think about  POJOs in terms of  interfaces and corresponding implementions. This is so because the database is a key-value store at the back, so it asks us to store the properties of the POJO in terms of  key-value pairs. Moreover there are no foreign keys in Neo4J, objects in the db are connected with other objects using Relationships. So the connections between different persons  will be represented by a  custom connection enum which will  implement the RelationshipType interface.

You should first create an interface for the Person POJO :

interface Name {
 int getId();
 void setId(int id);
 String getName();
 void setName(String name);
 Collection getConnections(Connections connection);
}

enum Connections implements RelationshipType {
 Knows, Friend, Enemy, Spouse, Colleague, Sibblings
}

The Person class which will implement this interface will have a Node object, which is a key-value store along with other things. All the properties of the  Person class will be stored in this Node object. We can only store properties of basic types  boolean, char, int, long etc and of String class in the Node.

class Person implements Name {
 private final Node dataNode;
 static enum NameKeys {
    ID, NAME }

 public Person(Node dataNode) {
   this.dataNode = dataNode;
 }

 @Override
 public int getId() {
   return (Integer) dataNode.getProperty(NameKeys.ID.name());
 }
 @Override
 public String getName() {
   return (String) dataNode.getProperty(NameKeys.NAME.name());
 }
 @Override
 public void setId(int id) {
  dataNode.setProperty(NameKeys.ID.name(), id);
 }
 @Override
 public void setName(String name) {
   dataNode.setProperty(NameKeys.NAME.name(), name);
 }
 @Override
 public Collection getConnections(Connections connection) {
  Iterable relationships = dataNode .getRelationships(connection);
  Collection<Person> persons = new ArrayList<Person>();
  for (Relationship relationship : relationships) {
    Node otherNode = relationship.getOtherNode(dataNode);
    persons.add(new Person(otherNode));
  }
  return persons;
 }
}

Now we would like to store some data, we can do so in the steps outlined  below :

  • Start the Database service
  • Begin a transaction
  • Create the required objects
  • Mark the transaction as success or failure
  • Close the transaction and the db service
public class BaiscGraph {
 public void createGraph(String fileName, String dbdir) throws Exception {
    GraphDatabaseService graphDb = new EmbeddedGraphDatabase(dbdir);
    Transaction transaction = graphDb.beginTx();

    Node personOneDataNode = graphDb.createNode();
    Name personOne = new Person(personOneDataNode);
    personOne.setName("One");
    personOne.setId(0);

    Node personTwoDataNode = graphDb.createNode();
    Name personTwo = new Person(personTwoDataNode);
    personTwo.setName("Two");
    personTwo.setId(1);

    personOneDataNode.createRelationshipTo(personTwoDataNode, Connections.Friend);

   transaction.success();
   transaction.finish();

   graphDb.shutdown();
 }
}

Now we have a graph db with data and we would like to query it to get some Person. But there is no inbuilt query mechanism so  how can I get the data for a simple query like  person having name=”One” . The db lacks such easy query mechanism but it supports Lucene and thus we can create an index for different properties of the data and then query that index for the same. In order to use the  LuceneIndexService available  in Neo4J we have to download the neo4j-index component.  The db mandates that all operations on the data must happen in a transaction so in order to index data we will open the db, indexservice and a transaction one by one, then index the data and then close everything one by one.

public void indexGraph() {
 GraphDatabaseService databaseService = new EmbeddedGraphDatabase(dbdir);
 IndexService indexService = new LuceneIndexService(databaseService);
 Transaction transaction = databaseService.beginTx();
 Iterable allNodes = databaseService.getAllNodes();
 for (Node node : allNodes) {
     if (node.hasRelationship()) {
        String key = Person.NameKeys.NAME.name();
        Object property = node.getProperty(key, "Not-Found");
        indexService.index(node, key, property);
    }}
 transaction.success();
 transaction.finish();
 indexService.shutdown();
 databaseService.shutdown();
 }

Now the index has been made so we can query this index for possible matches using simple APIs available in the IndexService, but again things have to be done in a transaction.

public void useIndexForQuerring() {
 GraphDatabaseService databaseService = new EmbeddedGraphDatabase(dbdir);
 IndexService indexService = new LuceneIndexService(databaseService);
 String arg = "One";
 Transaction transaction = databaseService.beginTx();
 Node node = indexService.getSingleNode(Person.NameKeys.NAME.name(), arg);
 if (node != null) {
    Person person = new Person(node);
    Connections[] values = Connections.values();
    for (Connections connections : values) {
      Collection connectedPersons = person.getConnections(connections);
      System.out.println("Retriving all friend :for type: " + connections + " connect count" + connectedPersons.size());
      for (Name name : connectedPersons) {
           System.out.println(name.getName());
      }
  }
 }
 transaction.success();
 transaction.finish();
 indexService.shutdown();
 databaseService.shutdown();
 }

The API also provides graph traversals as well as some graph algos like shortest path, Dijkstra etc. The Traverser API for graph traversal is rather simple and works like a charm. We need to specify a few attributes like the starting node , traversal type BFS or  DFS, RelationshipType we are looking for etc  and then it will traverse the whole graph starting at the specified node giving back matching nodes.

void traverseNode(Node node) {
 StopEvaluator stopEvaluator = StopEvaluator.END_OF_GRAPH;
 ReturnableEvaluator returnableEvaluator = ReturnableEvaluator.ALL_BUT_START_NODE;
 Traverser traverse = node.traverse(Order.BREADTH_FIRST, stopEvaluator, returnableEvaluator, Connections.Knows, Direction.OUTGOING);
 Iterator iterator = traverse.iterator();
 printNode(node, "-->:");
 int count = 0;
 while (iterator.hasNext()) {
     Node nextNode = iterator.next();
     Relationship singleRelationship = nextNode.getSingleRelationship(Connections.Knows, Direction.INCOMING);
     Node startNode = singleRelationship.getStartNode();
     Object property = startNode.getProperty(Person.NameKeys.NAME.name(), "Unknown");
     printNode(nextNode, "|-->:" + property + ":-->");
     count++;
 }
 System.out.println("total Connections :" + count);
 }

So after all of this I started evaluating Neo4J under some load so I tried putting 151K names data(available at census site)  into it by executing a large amount of transactions. But every time I did I got some IO Exception and every time I failed I tried again with a smaller data set . What I found was that the problem gets accentuated when there is a large data set, the larger the set the more frequent the problem is.  Googling around a bit landed me on sun page which says that the problem is due to windows memory mappings and describes a work around for the same.

java.io.IOException: The requested operation cannot be performed on a file with a user-mapped section open
 at sun.nio.ch.FileChannelImpl.truncate0(Native Method)
 at sun.nio.ch.FileChannelImpl.truncate(FileChannelImpl.java:337)
 at org.neo4j.kernel.impl.transaction.TxManager.changeActiveLog(TxManager.java:206)
 at org.neo4j.kernel.impl.transaction.TxManager.getTxLog(TxManager.java:185)
 at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:652)
 at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:543)
 at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:102)
 at org.neo4j.kernel.EmbeddedGraphDbImpl$TransactionImpl.finish(EmbeddedGraphDbImpl.java:329)
 at com.nosql.neo4j.CreateGraph.createNode(CreateGraph.java:44)
 at com.nosql.neo4j.CreateGraph.createSomeNames(CreateGraph.java:33)
 at com.nosql.neo4j.CreateGraph.createGraph(CreateGraph.java:21)
 at com.nosql.neo4j.CreateGraphTest.testGraphCreation(CreateGraphTest.java:33)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:59)
 at org.junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java:98)
  ....................
 SEVERE: Error writing transaction log

Conclusion

The Neo4J is a simple and efficient API to manage graphs in form of databases. If you are starting fresh then you can easily use it. If you have an existing system which you would like to refactor and fit Neo4J, then there are going to be a good amount of changes to the existing source code.  The API is quite good at traversals and graph related algos but  it would have been nice if the library provided some kind of SQL or template matching mechanism to search through the data. Under large amount of transactional load the system gave me exceptions  but I think it is just related to Windows and hope the library works fine on Linux or Unix.

Advertisements

3 Comments »

  1. Hi!

    Really nice that you took the time to put this blog post together! I’d like to comment on a few things.

    When shutting down the database and index services, you’re not closing a connection to the DB but the DB server itself. Thus, applications should normally only do this when shutting down the application itself, as shutting down and starting up a server are expensive operations.

    Note that you can store properties on Relationships as well, and that a property can also be an array of a basic type or String.

    I’m not sure regarding the exception you got, but if you just want to insert a lot of data, the batch inserter is a better (and faster) option. It doesn’t have transactions. The usual transactional Neo4j API isn’t meant to be used for bulk inserts, but is optimized for normal application load.

    Comment by Anders Nawroth — August 10, 2010 @ 1:56 pm

    • Thanks for all the pointers, I will give it a shot and see if it works fine. Also I would like to know if the db provides any mechanism for seaching through the pojo properties besides the indexing one ?

      Comment by Rahul Sharma — August 10, 2010 @ 9:46 pm

      • You’ll need to use some kind of indexing.

        One option people use to overlook is to use the graph itself as an index. By adding some extra structure to the graph, you can create you very own indexing functionality, tailored to your domain. There’s been some intriguing discussions on this topic on the Neo4j mailing list!

        Regarding the built-in indexing, you now (since version 1.1) have the option to do indexing operations in event handlers, using the new event framework.

        Comment by Anders Nawroth — August 11, 2010 @ 4:52 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: