Anti-together ActiveFinder behaviour

Morning!

I was looking forward to implement model-level cache layer as extension (or even hack) within ActiveRecord. Current implementation of ActiveFinder was a surprise for me. Let's say we have 2 models test[id], super_test[id, test_id].

SuperTest::model()->find() will load all fields of first record. Including test_id. SuperTest::model()->find()->test expected to query one table super_test with test_id field. But it didn't. It joined 2 tables instead. I believe there were strong reasons to do it this way. Maybe difficult cases of "->with()->" or something else. Could you please share them, guys?

I trust we could just run one-table query in most cases. This way is faster and easier to handle with caching. With explanation I could implement switches in correct places, add cache layer and propose diff which could help others.

Either I wonder if there well way to add database-connection switch functionality in concrete model depending on some function or anything. Second thing I really miss in current ActiveRecord implementation is sharding support. And probably there's way to do it without strictly changing AR code. I didn't find it. Going to share result too.

We are going to base on yii in extremely high-loaded project. And alternatives' list is completely empty. You are doing great work with Yii.

Nice find! Could you please create a ticket regarding optimizing the lazy load by not using JOIN query? We will implement this in 1.1 release.

Could you explain what "sharding support" is?

Unfortunately 1.1 release is so far away :. We have deadline in project right in december :]. I'll try to implement this feature in 1.0.6 and create ticket right after, including diff attachment. Next step gonna be model-cache layer implementation becoming increadibly easy.

Sharding is vertical database scaling. Realization of such a thing is only available inside of software cause it strictly depends on data we operate with. Excelent expalation and description of what sharding is can be found here: http://en.netlog.com…blogid=3071854. We are going to implement project with same (like in article) load on yii in fact. That’s why Sharding is top priority feature. I’ll attach here description of what I’m going to add to AR and how I gonna do it before implementing so we can discuss details.

That'll be great. The change for the lazy loading optimization may not be trivial. That's why I asked for 1.1 release.

I'm looking forward to discussing with you about sharding support. I didn't know this name, but this is something we wanted to do but don't have much experience with. I think your experience will help a lot here. Thanks.

Quote

add database-connection switch functionality in concrete model depending on some function or anything
this will also be a very useful feature, e.g. selecting between a master and a slave replica.

That was a great reading, thanks Boris!

In the spirit of N+1 queries for several many-to-many relations, I’d like to file a feature request that has been covered in this article.

However, i don’t think this belongs here, so I’ll create a ticket instead. Here you go.

Quote

However, i don't think this belongs here, so I'll create a ticket instead. Here you go.

It does. That's what i'm doing atm: researching to implement such queries plan to add cache layer on Models.

I'm going to implement N+1 methodic this way:

  1. CActiveRecord.getRelational()'s only task is to get related model. It doesn't need CActiveFinder at all. It has to use CActiveRecord.findByAttributes queries. Will replace.

  2. All JOINs logic is included in CActiveFinder with concrete interface. I'm going to implement abstract class describing it's interface and implement alternate way of doing same things with lots of queries. Let's call it CCachingActiveFinder (cause it only makes sense with caching).

Abstract class will define static factory method.

if ( we have model caching turned on )

{

    return CCachingActiveFinder;

}

else return CActiveFinder;

Besides doing same logic with more queries, new finder will try to put to cache something like "model-foobar-id#34" with model data. And before querying by array of IDs it'll try to get such data from cache. It's also required to add garbage collector to models. Model changed -> clear itself cache. I didn't find any conventions on inner cache naming. How should I name those models cache entries?

And most important thing here is to make all of that functionality setupable. Such realisation requires next settings to be set by user:

  1. Model cache on/off

  2. Lifetime of cache

  3. Exclude list: which models shouldn't be cached

  4. Maybe prefix for cache entries names

I wasn't able to find something about interacting with config from inner things like AR. So if you have some time, could you please give me some links or even describe a bit?

P.S. For sure all that caching functions will be separated to ModelCache class in addition to current cache implementation classes ;]

I’m not sure I can provide authentic explanation, but here’s my understanding of Yii’s internals.

The N+1 methodic is a new approach I first saw in Yii, which is about to split up huge join queries, so no more than 1 has_many or many_many relation is involved in one single query. It has nothing to do with my feature request.

I can’t see the point of adding exclude list, as model caching could be turned off individually within each model.

(Maybe there’s even no need to implement such a switch, as parent class can decide whether model can support caching or not.)

I suppose it’d be better to contribute this caching strategy as a standalone component to keep the core balanced.

As far as I know, config entries are not free to modify. You can however use Yii::app()->setParams(array) to redefine the whole parameter array, but not one-by-one.

For caching, I think the table name is acceptable as name prefix. As far as I know, naming is absolutely up to the developer. I currently use CDbCache with SQLite, and the primary key is a field of type CHAR(128).

Feel free to correct me, I’m not very experienced regarding Yii.

NEW VERSION BELOW!

------------------- Cut here -----------------------

Ok. I studied AR implementation and made small first patch to eliminate JOIN-queries usage in AR->getRelated() (all but MANY_MANY: they will use JOIN anyway). I decided to try to implement Sharding feature first. You seem to be more interested in it. Either cache layer supposed to be automatic while sharding not. So it would be more optimal to get new interface and capabilities to use first and add background optimizing feature after.

Could you please check patch to see if I took everything in account?

Patch is for branches/1.0, apply from root.

If I understood your patch correctly, there’s one thing I’d recommend to change. When defining condition for the related table, you took for sure that all the tables have an id column, but using their primary key field would be a better practice I think.

I believe call_user_func got deprecated in recent versions, so $class=new $relation->className could be a better idea, if this is supported by php.

Please note, findXXX methods can take CDbCriteria, you don’t have to pass them as an array.

I hope I could help.

By the way, I’m more interested in that queries can run in two steps, returning the ids first, then the selected fields. See the ticket for more information.

Ye, it helped. I modified find calls and did further investigation on PKs. I came from rails world so I thought it was some declaration. It's not and that's just great. Fixed patch attached.

P.S. call_user_func is not deprecated. It's ok. We need to handle static call here. So your syntax for replacement is not correct.

Yes, my bad, call_user_func is fine, I remembered wrongly, call_user_method is the deprecated one, sorry.

When determining foreign primary key, you cannot reach _md, as it is declared as private class property. Use getPrimaryKey() instead as you’ve done in line 24.

We’re on the right track to purge all errors. :) I hope an admin will confirm your patch.

Incorrect. I can reach _md since I’m in same class. And that’s how extending and private-protected work in PHP. I always test code before sending patch :P. AR.getPrimaryKey() returns primary key field value and not it’s name.

However it's true we should change this line to:

$fpk = call_user_func("$class::model")->getMetaData()->tableSchema->primaryKey;

Just to make it less hacky :].

qiang, could you please comment on this version? It seems to be mature enough which means I got AR architecture at last :D. I feel somewhat ready to go describe Sharding.

Ah, thanks for correcting me. Good improvements.

As far as I know, aliasToken defaults to ‘??.’. So doing [font=“Courier New”]str_replace($relation->aliasToken<strong class=‘bbc’>.’.’</strong>, ‘’, $value)[/font] might be incorrect. Can you confirm this?

It's default to '??' in my case. Just checked default config, '??' either.

Then there’s a typo in the API.

I have no more false alarms, good work! :) We have to wait for dev approval, maybe Qiang’s already testing this.

Nice try. There are still some work to do, though. First, we should not write things like "$fpk={$this->$key}" because the value may need to be escaped. Second, you didn't make use of the query options specified in the relation. Some query options (such as 'with') will still require a join.

Query should be escaped, thanks. Not only value in fact. Field can be broken too in theory. I'll try to find out Yii built in ways to escape both.

I wasn't able to do Foo::model()->with('…')->associate. It doesn't make sense in fact. Are there any available modifiers for relation request? I didn't find any.

with(‘relation’=>‘second’) will query not only for the related records, but for related records of the related records. (Or by defining ‘with’ param as Qiang mentioned above.)

If the secondary relation also has a ‘with’ param, other relations may also be involved. I can’t confirm this as I’ve never done anything this complex.

Seems more like a hierarchical problem. Can’t this be made so that it uses Yii internals recursively?