Eager Loading Disaster?

Hi,

Having used Yii 1.1 for a long time (though just in very few projects), I haven’t tried Yii 2.0 yet, but I’m reading the Guide and I am astonished to find this:

:o Please tell me I’m missing something. Please tell me, at least, that this is just the default and there’s an option to change that behavior.

If that is not the case, and if I understand this correctly, it would be a disastrous choice that destroys ActiveRecord completely and leaves us with writing literal SQL queries by hand as the only reasonable option.

Are you telling me that, in order to eager loading a bunch of, let’s say, posts with their corresponding authors (say it’s a BELONGS_TO relation called authorUser to a model whose table is user), it will perform two queries roughly similar to this:




  //first query, retrieves the primary records:

  SELECT /*...fields...*/ FROM post WHERE ......; 


  //second query, retrieves the related records:

  SELECT /*...fields...*/ FROM user WHERE post_id IN(1,13,32,.....) AND ....

  



Is that so? It can’t be. I must be missing something.

If it were so, it would be a complete disaster.

The first issue with this is that it does two queries instead of one. Multiplying by a factor of 2 the number of queries might be an insignificant issue. However, if you eager-load N relation at once, will it do 1+N queries, or will it still be 2 queries, one for the primary records and one for all the relations? If it’s the former, then the issue starts being relevant, even though it’s still a small limited multiplcation factor in the number of queries (but still totally avoidable).

The second, HUGE, issue is that this may result in insanely long queries. If, for example, you are eager-loading thousands of records, it means the second query would have an IN() condition with thousands of ids. I’m not even sure if that implies efficiency issues in retrieving the records, but what’s for sure is that a too-long query (in terms of number of characters) is an issue.

Also, if you think the issue of query length is relatively unimportant (but that’s not true) when you’re going to get a huge amount of records in the result anyway (and CPU-processing them to turn them into models), then consider this:

In Yii 1.1, you could use relations in a Criteria with with and together and use that criteria to count records.

For example (i’m writing examples without testing so I might leave something out, but the overall concept holds true and I have used and verified it several times):




  // Yii 1.1:

  // Get the total number of visible posts by visible users

  $criteria = new CDbCriteria(array(

    'with'=>array(

      'authorUser'=>array(

        'joinType'=>'INNER JOIN',

        'on'=>'authorUser.visible>0'

      )

    ),

    'together'=>true,

    'condition'=>'t.visible>0'

  ));

  $count=Post::model()->count($criteria);



This would produce in Yii 1.1 a query roughly similar (or equivalent) to this:




  SELECT COUNT(post.id) 

  FROM post AS t INNER JOIN user AS authorUser ON authorUser.id=t.id AND authorUser.visible>0

  WHERE post.visible > 0



that is, an O(1)-length query for a O(1)-size result, which would execute as effeciently as it gets provided that the tables have the indexes set up properly.

Now that I think about it, I wonder whether that criteria can even be expressed in Yii 2.

Please, prove me wrong and let me see what’s the part I’ve missed, because if I got it right it would mean I have to immediately stop investing my time in learning Yii 2.

It is.

Joins are possible, don’t worry.

I don’t see a reason why the former approach (implemented in Yii 1) is bad, and why is it replaced in the first place? My question is, is it posible to tell Yii to use join instead of IN(1,2,3…) when using eager loading with ActiveRecord clases and how?

It’s much easier implementation wise. AR code in Yii 2 is way simpler than the one in 1.1 which had too many errors popping up all the time because of its complexity. Also it’s now possible to work with multiple database types at once. For example, search model could be ElasticSearch-based while its relationships such as Post could be regular MySQL models: http://www.yiiframework.com/doc-2.0/guide-db-active-record.html#cross-database-relations

That’s how to do JOINs: http://www.yiiframework.com/doc-2.0/guide-db-active-record.html#joining-with-relations

I really like the change that makes ActiveRecord of Yii 2.0 far much simpler and more effective than CActiveRecord of Yii 1.1.

The key point is that the query for the main model and those for the related model(s) are always separated in ActiveRecord of Yii 2.0.

It may be a cause of a small (IMO, very tiny) performance hit for simple has_one (or, BELONGS_TO in Yii 1.1 term) relations. But it is a vital concept that makes Yii 2.0 ActiveRecord very much simpler and stronger than Yii 1.1 CActiveRecord.

As @samdark has said, it enables the relation between SQL-DB and NONSQL-DB.

And in the layer of the practical usages of the sql db, it makes much easier to handle has_many relations.

I once wrote like the following in a wiki article about the relational query of Yii 1.1 CActiveRecord:

The things are different now in Yii 2.0 ActiveRecord. :)

Relational Query - Lazy Loading and Eager Loading / with and together

Drills : Search by a HAS_MANY relation in Yii 1.1

Drills : Search by a HAS_MANY relation in Yii 2.0

[P.S.]

Laravel also do the same in eager loading:

http://laravel.com/docs/4.2/eloquent#eager-loading

I would love that too if it was an option, but if it is the only option, then it is a plain wrong design. The mere fact that the length of the query for the related model is proportional to the number of results is enough to make such an approach a no-go in a number of scenarios. It is not just a matter of the impact on the performance of the query execution; the length of a query may be limited, and when logging it through external services it may get discarded or truncated because of its excessive length.

I still have to check whether joinWith is a solution.

But if in order to get a single query with join (for the same semantics and for obtaining the same data) I need to write the code in a completely different way, switching from elegant and readable code to more convoluted code just to change the way the query is performed, then again, bad design. Still have to see if that is the case, though.

All that is cool, but again, if it is something you can choose. Otherwise, you may be forced to go back to writing the SQL queries directly.

And by the way (but I drift off-topic here, to something that has always bothered me), you shouldn’t have to choose between writing low-level queries directly (if you want and have a reason to, of course: you don’t need to explain me why it is better to write high-level code than to write low-level queries) and taking advantage of the framework models, relations, etc. I’d love to see a framework that allows you to have both: for example, write a low-level query while respecting some elegant conventions (for table and field aliases and the like), and have the framework scan the results and automatically convert them to models.

Perhaps Yii 2 allows that and I still have to find out :)

At first when I saw the post and consequences I was dissapointed, because there is no thing that I used so often, BUT here comes a good part :)

Active Record classes are not recommended if you are working with more than 50 rows and we should never encounter scenario above (basically we will always have a limit somewhere). This means our queries are not going to be so huuuuge, and if they are then we should use lower abstraction layer.

Apart from this benefits we got by different implementation of the ActiveRecord class, there is one benefit we should consider and that is "more control" over the queries. Here is example (please note that this example covers exactly what you expect when using eager loading: JOINS instead IN(1,2,3…) ):




// SELECT `customer`.* FROM `customer`

// LEFT JOIN `order` ON `order`.`customer_id` = `customer`.`id`

// WHERE `order`.`status` = 1

// 

// SELECT * FROM `order` WHERE `customer_id` IN (...)

$customers = Customer::find()

    ->select('customer.*')

    ->leftJoin('order', '`order`.`customer_id` = `customer`.`id`')

    ->where(['order.status' => Order::STATUS_ACTIVE])

    ->with('orders')

    ->all();

This way we can easily decide should we use leftJoin or not, and do we need eager loading (orders in this example) or not. This gives me much more control over the query itself. Also, this is much more explainable to someone who is reading the query.

Often with Yii 1.0 eager loading, you are just ignoring the query that is generated, and we are not trying to optimize it.

  1. If Active Record classes are “not recommended” for working with more than 50 rows, they are poorly implemented. What are we supposed to do when we have to handle a few hundred results (which is not that huuuge at all)? go back to associative arrays? Let’s suppose you say “yes, exactly”. What do we do with those associative arrays? Either we handle them directly (no use of model objects, ugly code, error-prone, etc), or we build model instances with them: so, the overhead of instantiating models has to be there anyway. That overhead cannot be that big, if it is, it’s an issue.

  2. a query with an IN(…50 ids…) is still ridiculously long.

  3. See my count() example: there is no high number of objects involved there, even if the number of counted records was in the thousands (and it may well be)

Note that I’m not against the possibility to have the two split queries, it’s the way to go in a lot of cases, I am against not having the possibility to avoid it.

I don’t get it: if the comments above your code show the actual resulting queries, it’s two queries the second of which has the IN(…) condition. What’s the point in having the join in the first query if it doesn’t avoid the second query with IN?

Ok I get it, you use the join to filter the results. But then, you could have used the same join (with more fields in the select) for getting the data of the orders in the same query (which is what you do in Yii 1.1 with with and together and without having to explicitly add a join), and instead, you have a second query for that. That’s not “exactly what I expect”.

I don’t get how that changes (for better) in Yii 2.0. That (ignoring the query that is generated) happens with every framework (and actually it’s part of the point of a framework: not having to worry about the query).

But when you do stop ignoring the query and you do worry about optimizing it, in Yii 1.1, with some difficulty (and with all its limits and quirks), you can. In yii 2.0 it looks like you cannot anymore.

You are totaly right about the query. I din’t pay attention on the second SELECT query, meaning it is not a result we are expecting.

Now, let’s go back to the story. How to make EAGER loading with normal JOIN instead of IN using new ActiveRecord class?

As I said, the query for the main model and those for the related model(s) are always separated.

Let’s examine it by examples.

In the following examples, I have a simple 1-N relation between the 2 models, i.e., A JobType has many Jobs.

  1. Lazy Loading



    $jobTypes = JobType::find()->all();

    foreach($jobTypes as $jobType) {

        echo $jobType->name;

        foreach($jobType->jobs as $job) {

            echo $job->name;

        }

    }



This will result in 1 + N queries like:




SELECT * FROM `job_type`

SELECT * FROM `job` WHERE `job_type_id`=1

SELECT * FROM `job` WHERE `job_type_id`=2

...

SELECT * FROM `job` WHERE `job_type_id`=11



The related Job models are lazily loaded in the foreach loop.

  1. Eager Loading



    $jobTypes = JobType::find()->with("jobs")->all();

    ...



This will result in 2 queries like:




SELECT * FROM `job_type`

SELECT * FROM `job` WHERE `job_type_id` IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)



  1. Eager Loading using joinWith



    $jobTypes = JobType::find()->joinWith("jobs")->all();

    ...



This also will result in 2 queries, although you might not like it.




SELECT `job_type`.* FROM `job_type` LEFT JOIN `job` ON `job_type`.`id` = `job`.`job_type_id`

SELECT * FROM `job` WHERE `job_type_id` IN (10, 4, 6, 3, 8, 7, 11, 9, 2, 5, 1)



Note that Yii will look for the result set of the first query just to populate the main models. Yii doesn’t try to populate the related models using it.

  1. Lazy Loading using leftJoin



    $jobTypes = JobType::find()

        ->leftJoin('job', '`job`.`job_type_id` = `job_type`.`id`')

        ->all();

    ...



Since you haven’t specify “with” or “joinWith”, it will do the lazy loading.




SELECT `job_type`.* FROM `job_type` LEFT JOIN `job` ON `job`.`job_type_id` = `job_type`.`id`

SELECT * FROM `job` WHERE `job_type_id`=1

SELECT * FROM `job` WHERE `job_type_id`=2

...

SELECT * FROM `job` WHERE `job_type_id`=11



  1. Eager Loading using with and leftJoin



    $jobTypes = JobType::find()

        ->leftJoin('job', '`job`.`job_type_id` = `job_type`.`id`')

        ->with('jobs') // required for eager loading

        ->all();

    ...



You had to specify "with" in order to do the eager loading.




SELECT `job_type`.* FROM `job_type` LEFT JOIN `job` ON `job`.`job_type_id` = `job_type`.`id`

SELECT * FROM `job` WHERE `job_type_id` IN (10, 4, 6, 3, 8, 7, 11, 9, 2, 5, 1)



  1. Eager Loading using joinWith and leftJoin … error



    $jobTypes = JobType::find()

        ->leftJoin('job', '`job`.`job_type_id` = `job_type`.`id`')

        ->joinWith("jobs")

        ->all();

    ...



But this will end up in a syntax error, since it produces 2 left joins:




SQLSTATE[42000]: Syntax error or access violation: 1066 Not unique table/alias: 'job'

The SQL being executed was: SELECT `job_type`.* FROM `job_type`

    LEFT JOIN `job` ON `job_type`.`id` = `job`.`job_type_id`

    LEFT JOIN `job` ON `job`.`job_type_id` = `job_type`.`id`



  1. Eager Loading using joinWith and leftJoin



    $jobTypes = JobType::find()

        ->leftJoin('job as j', '`j`.`job_type_id` = `job_type`.`id`')

        ->joinWith("jobs")

        ->all();

    ...



You had to use a table alias for leftJoin.

There’s no error in the resulting queries, although you may laugh out loud or get angry at the useless 2 left joins.




SELECT `job_type`.* FROM `job_type` LEFT JOIN `job` ON `job_type`.`id` = `job`.`job_type_id`

    LEFT JOIN `job` `j` ON `j`.`job_type_id` = `job_type`.`id`

SELECT * FROM `job` WHERE `job_type_id` IN (10, 4, 6, 3, 8, 7, 11, 9, 2, 5, 1)



So, I have to say it again that the query for the main model and those for the related model(s) are always separated.

You may use ‘joinWith’ or ‘leftJoin’ in order to join the related table in the query for the main model, but you must not do it expecting a performance boost, since it won’t populate the related models. The joining of the related table(s) should be done only when you need to filter or sort the main models by the fields of the related models.

Unbelievable.

Such a gigantic amount of work to develop a new version of the framework full of fantastic new features, all wasted by a core design decision that renders it all useless.

Really sad. Especially considering it’s a step backwards, loosing something that Yii 1.1 already had (with its limitations and issues).

IMO, db handling (ActiveRecord, ActiveQuery, … etc) is one of the most improved parts in Yii 2.0 compared to 1.1. I second the design decision made by the developers.

I’m very sorry that we could not understand each other.

You can take a look at the following discussion in github where it is clearly stated that this is the intended design.

Eager loading : Issue #2379 yiisoft/yii2

[P.S.]

As for "CDbCriteria", the counterpart might be "Query" or "ActiveQuery".




$query = JobType::find()->joinWith('jobs')->where(['job.active' => true]);

$count = $query->count();

$jobTypes = $query->all();



Please refer to the following doc:

Upgrading from Version 1.1 - Query Builder

I understood that from the very first reply.

The intended design is wrong.

And the argument is "the other way was too difficult to implement" or "We tried that in 1.1 [and they got almost there, btw] and we had too many errors".

Cool, so let’s just give up.

One may even have different opinions about whether it’s better to do a single joined query or multiple queries. Either can be better than the other depending on the scenarios.

But that’s precisely why you cannot impose one choice over the other. The multiple-query approach is a complete no-go in some scenarios, meaning one simply cannot afford to do the queries like that.

Luckily I found out before I digged too deep into studying yii 2.0. It’s a disappointment, because it looked promising (and I liked yii 1.1 but it is a bit obsolete already, so I hoped Yii2 would be great) but at least I didn’t waste much time.

Sorry but I can’t resist commenting on this (and the issue is closed to comments):

That reveals how little they have even analyzed the issue. Only taking into account the speed of execution of the query. Not even considering the length of the queries as an issue; the implications when they are done over network; logging; etc.

So, since there’s no hope the developers will ever change that, the only reasonable and useful question I can ask now is: can anybody recommend me a good framework to switch to?

Switch to laravel. Oh… wait… they do the same thing.

I think there is a little too much drama in here.

I got kinda tired reading all the comments here… tell me in simple words how Yii AR doesn’t work for you? As others have shown, you can do it with a join or without a join (2 queries). You choose when writing the code… so whats the problem exactly?

I have to admit that I’m also a bit dissapointed with the fact that instead of JOIN, IN is performed. I searched what is performance impact when using JOIN and when using IN, and lot depends on DB query optimizer. Unfortunatelly MySQL is not the best in that case because there are several examples (here is one example, second example) stating issues with queries using IN clause.

The other thing is this Laravel. If Laravel is doing eager loading using IN that is not excuse for us (Yii comunity and core developers) to use it in Yii. Again, Yii is still the best on the market today, we have this drowback, but apart from this, a lot of other fancy features are added in so will continue using framework.

Eager loading using multiple queries (i.e., using IN instead of JOIN) has already been used in Yii 1.1.

Relational Query - Lazy Loading and Eager Loading / with and together

Note that you have to use it when you want to apply "offset" and/or "limit" to the main model, because single query eager loading (i.e., using JOIN) can not control the number of the main model instances in the result set when the relation is HAS_MANY.

You can not go without the multiple query eager loading, while the single query eager loading can be an option for certain scenarios. In other words, it’s not a problem of “Which should we use, IN or JOIN?”, but “Can we totally go away with JOIN?”

@teo1978

Are you still there? Please spare some additional time for me.

One thing I couldn’t understand was your persistent preference to the length of the query over that of the result set.

I agree that the length of a query with “id in (1,2,3,4,…)” could be a problem when the number of the records is very large. But that is for the 2nd query or later. I mean, we have to receive the result set of the 1st query before we handle the 2nd query. When the length of the 2nd query is problematic, the chances are that we are already suffering from the huge result set of the 1st query that should be limited by ‘limit’ clause.

And what can maximize the size of the result set? In general, it’s a query with many left joins without ‘limit’.

If we are to eager load 1-N related records without ‘limit’, the sizes of the result set should be like the following:

Single Query:




TOTAL_SIZE = (SIZE_MAIN + SIZE_REL) * COUNT_REL

PEAK_SIZE  = TOTAL_SIZE



Multiple Query:




TOTAL_SIZE = SIZE_MAIN * COUNT_MAIN + SIZE_REL * COUNT_REL

PEAK_SIZE  = MAX(SIZE_MAIN * COUNT_MAIN, SIZE_REL * COUNT_REL)



The former matches the latter only in TOTAL_SIZE when the relation is has_one where COUNT_MAIN equals COUNT_REL. Elsewhere it is less efficient in memory size.

And one thing that is very important: generally you can not apply ‘offset’ and ‘limit’ to the single query eager loading when the relation is has_many. You can not optimize the page size (and hence the performance) by applying ‘offset’ and ‘limit’.

Although you never referred to the words ‘limit’ and ‘offset’, I’m sure that you knew the fact that we can not use the single query eager loading when we want to offset/limit the result of the main models when the relation is has_many. Didn’t you? Ignoring this fact in the discussion looked a bit unfair to me. In fact, I’d like to ask you to show us the practical use cases with has_many relations where the single query eager loading should be preferred.

If you have not yet read through these "Working with Databases" sections of the guide, you should read them before you decide to leave Yii 2.

  • Database Access Objects

  • Query Builder

  • Active Record

You’ll find that the transition from the raw sql to AR is more continuous and linear than in Yii 1.1.

And probably you’ll like many new features, e.g. Retrieving Data in Arrays, Retrieving Data in Batches, … etc.

Sorry for the long talk > @teo1978 and all

Sorry for the delay.

I have to post this in two (or more) parts because, apparently, there’s a stupid limitation on the number of blocks of quoted text. So I’ll split at one or more random points.

I already did. If you got too tired reading it, I don’t think repeating it would help. And you don’t have to follow the discussion if it bores you.

No you can’t, as I have shown in response. You can use a join, but you still don’t get the related models with the join query.

Totally agreed.

That’s why I find it so sad that they do such a huge mistake.

Yii is pure genius, but this one is simply a big mistake.

The dilemma (for a FRAMEWORK) is not [b]whether to use a single query or 2[b] (or N+1 for N relations): that decision is for who writes the application to be taken. Both approaches have their pros and cons and the second is definitely preferrable in a number of scenarios. It is also a complete no-go in others. So, a general-purpose framework must provide both options, it’s as simple as that. You cannot restrict that much the freedom of the developer to design their database queries.

Yes but there you had the choice: you could use together and have the

part II

Yeah but:

  • what about the cases where I don’t need limit and offset (actually, the whole discussion is about those cases, obviously)

  • what if it’s just a BELONGS_TO relations? (or does Yii2 in this case allow the eager loading without the second query?)

OF COURSE (I mean, of course you can’t, of course you are right). I never said you should “totally go away with join”, I said the two options should be available.

As I said, that should be a design decision for the application author, not for the framework authors. You are completely banning an option in database query design, you are assuming that a given kind of query strategy is never to be used, and that’s a wrong assumption. You shouldn’t even ask for a use case: the mere fact that one could want to do a single join query to obtain all the data, because that’s something that one can do with a database, should be enough.

I don’t understand what you mean by preference over: what I said and say is that the length of the query can be an issue in and of itself. It can be for a few reasons. Logging is one (do I need to elaborate?) and it could be enough to make it unviable. Another is that I don’t think DBMSs are usually designed to operate with queries whose length increases linearly with the number of expected results, so there might be performance issues and bottlenecks, but I admit this part is speculative. However, when you (meaning the application developer) are deciding how to query your database, if you face the choice between a single query (with O(1) length) and two queries, the second of which is O(N)-long, and you know that the first is perfectly sane and doesn’t pose any potential issues, and it’s easier to write, and there’s no advantage whatsoever in the second approach, you would choose the first, rather than having to investigate whether the length of the query in the second approach can induce performance issues.

That’s a good point, but as I said, that should be the application developer’s decision.

Also, another scenario I had in mind was the COUNT() example I made (where you expect a O(1)-size result, and hence it would be ridiculous to have a O(N)-long query).

But as you showed me, it appears that can be solved with joinWith, so I was probably wrong about that use case. In Yii 1.1, you could write the CDbCriteria in the exact same way you would write it in order to obtain all the data, and then use it to count instead: if used with together, it would result in one query, with joins, returning one result. In Yii 2, apparently you need to change a bit the code, but if you can use joinWith and still take advantage of relation definitions, then it’s no big deal (or no deal at all).

Yes I knew that: and those are the cases where, if one was to write the low level queries, would choose the two-queries-with-in approach, as it’s the only viable one.

Here we are talking about the cases where both approaches make sense, and Yii2 excludes the possibility to use the other approach.

That’s good news. And thanks for the advice.

I have no doubt about that, that’s why it was so disappointing to see there’s a very specific but very deep design flaw that may be an obstacle in using Yii, preventing me from enjoying all the other great features.