Graph Databases. Gremlin Query Building

Hi guys.

I’ve been in the process of adding support for Tinkerpop’s Rexster.

I’m only in early stages as I’ve had little time so far. But I already have a working proof of concept. I would however like to go over this with everyone now before going any further. To make sure I’m taking the right route.

[size="3"]Overview:[/size]

You can check the rexster wiki for more information but in a nutshell the rexster server should supports all blueprint graph implementations. Here’s a list of the officialy supported ones:

  • TinkerGraph (TinkerGraph)

  • Neo4j Implementation (Neo4jGraph)

  • Sail Implementation (SailGraph)

  • OrientDB Implementation (OrientGraph)

  • Dex Implementation (DexGraph)

  • Rexster Implementation (RexsterGraph)

  • more

There are three main parts to the integration into Yii2. The first one is the connection class for Rexster (rexpro).

The second is a query builder for gremlin. This part should -hopefully- be common to most graph databases (if someone were to implement a neo4j/orientdb/other specific connection class for instance).

The third is populating and handling of the activeRecord (schema-less?)

Note that I am not all that experienced with gremlin so please feel free to tear this appart if I’m going the wrong way about it.

Current status:

So far I’ve set it up quickly to test querying, find() findByScript() all() one() work and return active records as per the details I will add further into this post.

Before pushing this online I would like to get some feedback and clear a couple of points out so that I can clean everything and document a little more.

[size="3"]Rexpro Connection class:[/size]

This one is more or less finished. It uses a php client for rexpro I made as a helper. I still need to look into setting up transaction support within Yii2 (the client supports it so it’s just a matter of hooking it up)

Unit tests should cover most cases.

[size="3"]Gremlin builder:[/size]

This one is tricky and is what I need the most input on. Since it’s probably easier to just give usage examples of the current implementation I’ll go ahead and do just that.

First off though here’s a bit of literature:

  • Basic graph traversal

  • Gremlin docs (unofficial)

Now for these examples I will use the basic tinkergraph graph used in most gremlin examples:

Now the first step is to define a base pool of elements in your active record. The idea is that this should work in a ‘similar’ fashion to the current Yii 1.1.x tableName() method. In this case it is referred to as startPool() (I’ll be more than glad to rename if you have any suggestions). It’s a gremlin script that should return a set of elements you can pipe. It could be a root node in the case of a tree graph or all elements in the graph.

It defaults to all:


public static function startPool()

{

	return 'g.V';//must return a set of elements 'g.v(1)' does not!!! must be '[g.v(1)]'

}

Once this is done here are some basic use cases using ‘addStep()’:




/**

 * simple case

 */

$models = AR::find()->addStep('out')->addStep('out')->all(); 

// builds gremlin : g.V.out().out()

// returns v[5],v[3]


/**

 * Case with params

 */

$models = AR::find()->addStep('out','knows')->all(); 

// builds gremlin : g.V.out('knows')

// returns v[2],v[4]


/**

 * Sometimes you might want to build the query and later depending on 

 * user rights/roles/etc.. or various cases, affect a step 

 * Setting the third param to true will merge rather than add a step (only if the previous step is the same sort)

 * This works well with pools. See further down

 */

$models = AR::find()->addStep('out','knows')->addStep('out','created',true)->all(); 

// builds gremlin : g.V.out('knows','created')

// returns v[2],v[4],v[3],v[3],v[5],v[3]

$models = AR::find()->addStep('out','knows')->addStep('out','created',true)->addStep('dedup')->all(); 

// builds gremlin : g.V.out('knows','created').dedup()

// returns v[2],v[4],v[3],v[5]




As you can see in those third and fourth examples. You can affect previously set steps if the conditions are right. but given the complexity of some applications, you might want to add filters or scopes in some situations and this becomes more complicated than simply filtering through AND in a RDBMS (for example).

So I introduced Pools (name to define).

A Pool is a set of steps. You essentially add steps until you want to flag them as belonging to a pool incase you want to back track.By default the startPool() belongs to the pool named "start"

Here are examples:




/**

 * This example creates a poorly optimized query but explains well

 */

//lets find all developpers

$query = AR::find()->addStep('in','created'); // theoretical gremlin : g.V.in('created')

/*


... Code here ..




Then we realize that a user is a java manager so we need to change the 

query to find all developers that made a java application

*/

if(**is java manager**)

    $query->addStep('filter','it.lang == "java"',false,'start'); 

// gremlin: g.V.filter{(it.lang == "java")}.in('created');

//given our graph this doesn't change the result but say we now see that user is only affected to project "ripple"

if(**is ripple manager**)

    $query->addStep('filter','it.name == "ripple"',true,'start'); 

// gremlin: g.V.filter{(it.lang == "java")&&(it.name == "ripple")}.in('created');


//At the end we have filtered to only return v[4]


/**

 * Setting a new pool

 */

// find all applications

$query = AR::find()->addStep('outE','created')->asPool('createdEdge')->addStep('inV');

//builds gremlin : g.V.outE('created').inV

//now we realise this user can only see software that were created with a weight of over 0.5

$query->addStep('has','"weight", T.gt, 0.5f',false,'createdEdge');

//query is now:  g.V.outE('created').has("weight", T.gt, 0.5f).inV

//which returns v[5]




That’s basic usage in a nutshell. This is implemented and works so far. it needs a little ironing out though (to accept params as arrays and to give the possibility to choose between AND and OR for some steps.

I did not mention it yet but there is also an addSteps() method that allows you to register a script. For example: addSteps(’.out.out.filter{it.name == “marko”}.dedup()’); you will however not be able to merge steps into this .

Points that I could get cleared up / need input on:

  • The query building is currently in activeQuery and uses arrays. I would like to switch this to another class and use a series of objects rather than an array what would the best way of doing this be as far as Yii2 file structure goes? Should I make a Command/other class?

  • Am I even going the right way about this? Does this kind of functionality cover usual (hopefully most) usage cases?

Also, WIPs for as soon as this gets some feedback:

  • support for forking and merging

  • support for and/or (similar to above)

[size="3"]Populating ActiveRecords[/size]

There’s a tricky part here in the sense that element properties in a graph DB are schema-less. In my case with rexster there is a pseudo-schema I can use so I’ve gone the simple route and used that.

A vertex result is in the form of:


array(

'_id'=> 1,

'_properties'=>array(

               'name'=>'marko',

               'age'=>27

               )

)

And an Edge element result is in the form of:


array(

'_id'=> 9,

'_inV'=> 3,

'_outV'=> 1,

'_label'=> 'created',

'_properties'=>array(

               'weight'=>0.4,

               )

)

So I set them "as is" in the AR (I use the same active record for both type of elements, I think Hansael used two in Neo4Yii)

And I use methods to get/set/unset properties (ie: getProperty($name), setProperty($name,$value), unsetProperty($name))

If you have any suggestions here I’m more than happy to hear them.

For yii staff: in the process would it be possible to have an AR method named getMetaSchema() and replace calls to getTableSchema() for this new method (that would wrap getTableSchema() for RDBMS)

Things I need input on:

  • indexes. How do we go about this? Should Yii set indexes or should we rely on db configuration for auto indexing? (probably not). So if yii does it , ideally, any implementation you fancy?

Ok that sums it up more or less. Any comments are welcome.

Thanks in advance

Any input would be welcome. If you have any usage you would like to see?

It would be good to get the gremlin aspect covered solidly as this would be the central stone for any graph DB server (not just rexster).

I’ve left the use of gremlin relatively “raw” on purpose as there can be multiple flavors to gremlin and an added level of complexity since it’s proper scripting (with for loops etc… unless I’m mistaking you can even define your own steps in java to use them if you wish). So this way most uses should be covered and enough flexibility left for any particular needs. There might be a bit of a gray line between the two however (might not see to what extent until people start using it on a regular basis.

I will proceed through with these concepts for now.

For readability I will probably convert the pool query building from using arrays to using a collection of objects. Which implies that I will have to add a “Pool” class at the very least (letting activeQuery handle the collection aspect). I’ll just go ahead and chuck it into yii/db/rexster/ with my other files and worry about the correct place for it later.

As a reminder to myself (Rexster related):

  • check with gremlin mailing how isolation operates. (specificaly undefining bindings)

  • on the same topic check variable GC between requests. (or lack of)

Hello pommeverte,

can you share your project?

Hello yegreS,

This project has changed quite a bit. It has currently been set under a proprietary license and we aren’t ready just yet to make an opensource release.

There are several reasons for this. The first one being that it simply isn’t ready for a public release as we’ve been concentrating on making it work with titan DB. This means that a lot of features aren’t currently DB agnostic. Such as the Schema class, some aspects of DB Migrations & Fixtures, and certain ActiveRecord methods.

There are also some portions of the libraries we wrote for this extension that need to be refactored so they can be properly extended by the people using them.

This is obviously in the works but we have a few more pressing matters ATM. It is also very dependent on the release of TinkerPop 3 as this fixes various incongruities between the blueprint API and native graph DB APIs. (mostly labels + Index handling). So we most likely won’t get around to it until TP3 is fully released and some bugs are ironed out.

In the meantime, I’m more than happy to provide anyone with information/help/etc. in using graph databases with yii2 (or yii1).

You can find a rexster/gremlin-server driver we’ve open-sourced here. It is relatively easy to implement a Connection class for this and start using graph databases in Yii. A basic implementation of ActiveRecord might be a little more work but it’s not that bad and I can answer any questions you may have.

Also, there are other PHP ORM/OGM implementations currently in the works. I know pheonix-labs are working on one implementation (it is run by Michael), also github user Travis Black has a working implementation with his company and they are planning a public release as well.

It might be an option to get into contact with them as it should be pretty easy to wrap those projects up into a yii2 extension.

Let me know if you need any other information.