Hi guys.
I’ve been in the process of adding support for Tinkerpop’s Rexster.
I’m only in early stages as I’ve had little time so far. But I already have a working proof of concept. I would however like to go over this with everyone now before going any further. To make sure I’m taking the right route.
[size="3"]Overview:[/size]
You can check the rexster wiki for more information but in a nutshell the rexster server should supports all blueprint graph implementations. Here’s a list of the officialy supported ones:
-
TinkerGraph (TinkerGraph)
-
Neo4j Implementation (Neo4jGraph)
-
Sail Implementation (SailGraph)
-
OrientDB Implementation (OrientGraph)
-
Dex Implementation (DexGraph)
-
Rexster Implementation (RexsterGraph)
-
more
There are three main parts to the integration into Yii2. The first one is the connection class for Rexster (rexpro).
The second is a query builder for gremlin. This part should -hopefully- be common to most graph databases (if someone were to implement a neo4j/orientdb/other specific connection class for instance).
The third is populating and handling of the activeRecord (schema-less?)
Note that I am not all that experienced with gremlin so please feel free to tear this appart if I’m going the wrong way about it.
Current status:
So far I’ve set it up quickly to test querying, find() findByScript() all() one() work and return active records as per the details I will add further into this post.
Before pushing this online I would like to get some feedback and clear a couple of points out so that I can clean everything and document a little more.
[size="3"]Rexpro Connection class:[/size]
This one is more or less finished. It uses a php client for rexpro I made as a helper. I still need to look into setting up transaction support within Yii2 (the client supports it so it’s just a matter of hooking it up)
Unit tests should cover most cases.
[size="3"]Gremlin builder:[/size]
This one is tricky and is what I need the most input on. Since it’s probably easier to just give usage examples of the current implementation I’ll go ahead and do just that.
First off though here’s a bit of literature:
-
Basic graph traversal
-
Gremlin docs (unofficial)
Now for these examples I will use the basic tinkergraph graph used in most gremlin examples:
Now the first step is to define a base pool of elements in your active record. The idea is that this should work in a ‘similar’ fashion to the current Yii 1.1.x tableName() method. In this case it is referred to as startPool() (I’ll be more than glad to rename if you have any suggestions). It’s a gremlin script that should return a set of elements you can pipe. It could be a root node in the case of a tree graph or all elements in the graph.
It defaults to all:
public static function startPool()
{
return 'g.V';//must return a set of elements 'g.v(1)' does not!!! must be '[g.v(1)]'
}
Once this is done here are some basic use cases using ‘addStep()’:
/**
* simple case
*/
$models = AR::find()->addStep('out')->addStep('out')->all();
// builds gremlin : g.V.out().out()
// returns v[5],v[3]
/**
* Case with params
*/
$models = AR::find()->addStep('out','knows')->all();
// builds gremlin : g.V.out('knows')
// returns v[2],v[4]
/**
* Sometimes you might want to build the query and later depending on
* user rights/roles/etc.. or various cases, affect a step
* Setting the third param to true will merge rather than add a step (only if the previous step is the same sort)
* This works well with pools. See further down
*/
$models = AR::find()->addStep('out','knows')->addStep('out','created',true)->all();
// builds gremlin : g.V.out('knows','created')
// returns v[2],v[4],v[3],v[3],v[5],v[3]
$models = AR::find()->addStep('out','knows')->addStep('out','created',true)->addStep('dedup')->all();
// builds gremlin : g.V.out('knows','created').dedup()
// returns v[2],v[4],v[3],v[5]
As you can see in those third and fourth examples. You can affect previously set steps if the conditions are right. but given the complexity of some applications, you might want to add filters or scopes in some situations and this becomes more complicated than simply filtering through AND in a RDBMS (for example).
So I introduced Pools (name to define).
A Pool is a set of steps. You essentially add steps until you want to flag them as belonging to a pool incase you want to back track.By default the startPool() belongs to the pool named "start"
Here are examples:
/**
* This example creates a poorly optimized query but explains well
*/
//lets find all developpers
$query = AR::find()->addStep('in','created'); // theoretical gremlin : g.V.in('created')
/*
... Code here ..
Then we realize that a user is a java manager so we need to change the
query to find all developers that made a java application
*/
if(**is java manager**)
$query->addStep('filter','it.lang == "java"',false,'start');
// gremlin: g.V.filter{(it.lang == "java")}.in('created');
//given our graph this doesn't change the result but say we now see that user is only affected to project "ripple"
if(**is ripple manager**)
$query->addStep('filter','it.name == "ripple"',true,'start');
// gremlin: g.V.filter{(it.lang == "java")&&(it.name == "ripple")}.in('created');
//At the end we have filtered to only return v[4]
/**
* Setting a new pool
*/
// find all applications
$query = AR::find()->addStep('outE','created')->asPool('createdEdge')->addStep('inV');
//builds gremlin : g.V.outE('created').inV
//now we realise this user can only see software that were created with a weight of over 0.5
$query->addStep('has','"weight", T.gt, 0.5f',false,'createdEdge');
//query is now: g.V.outE('created').has("weight", T.gt, 0.5f).inV
//which returns v[5]
That’s basic usage in a nutshell. This is implemented and works so far. it needs a little ironing out though (to accept params as arrays and to give the possibility to choose between AND and OR for some steps.
I did not mention it yet but there is also an addSteps() method that allows you to register a script. For example: addSteps(’.out.out.filter{it.name == “marko”}.dedup()’); you will however not be able to merge steps into this .
Points that I could get cleared up / need input on:
-
The query building is currently in activeQuery and uses arrays. I would like to switch this to another class and use a series of objects rather than an array what would the best way of doing this be as far as Yii2 file structure goes? Should I make a Command/other class?
-
Am I even going the right way about this? Does this kind of functionality cover usual (hopefully most) usage cases?
Also, WIPs for as soon as this gets some feedback:
-
support for forking and merging
-
support for and/or (similar to above)
[size="3"]Populating ActiveRecords[/size]
There’s a tricky part here in the sense that element properties in a graph DB are schema-less. In my case with rexster there is a pseudo-schema I can use so I’ve gone the simple route and used that.
A vertex result is in the form of:
array(
'_id'=> 1,
'_properties'=>array(
'name'=>'marko',
'age'=>27
)
)
And an Edge element result is in the form of:
array(
'_id'=> 9,
'_inV'=> 3,
'_outV'=> 1,
'_label'=> 'created',
'_properties'=>array(
'weight'=>0.4,
)
)
So I set them "as is" in the AR (I use the same active record for both type of elements, I think Hansael used two in Neo4Yii)
And I use methods to get/set/unset properties (ie: getProperty($name), setProperty($name,$value), unsetProperty($name))
If you have any suggestions here I’m more than happy to hear them.
For yii staff: in the process would it be possible to have an AR method named getMetaSchema() and replace calls to getTableSchema() for this new method (that would wrap getTableSchema() for RDBMS)
Things I need input on:
- indexes. How do we go about this? Should Yii set indexes or should we rely on db configuration for auto indexing? (probably not). So if yii does it , ideally, any implementation you fancy?
Ok that sums it up more or less. Any comments are welcome.
Thanks in advance