AR query performance
Posted 11 May 2012 - 10:18 PM
Join-statements merely for the sake of getting all the data in one round-trip actually don't make much sense, when you really think about it - you're often transfering and traversing loads of duplicate data, putting more load on the network and adding memory overhead.
That's not really why we have join-statements in the first-place - they exist for situations where you need to obtain a product of some sort - e.g. something involving sum, avg, count, min or max functions. Join queries are not really an optimization-tool, but most ORMs treat them as such.
But there is in fact a way to execute multiple queries in one round-trip, at least with the mysqli-extension:
This does not appear to be possible with PDO, which, I'm guess, is why nobody uses it.
But think about it - if you have 3 customers and 50 orders, and you join those, selecting all the fields, you're actually sending each customer's data more than 16 times repeatedly. This data has to be serialized by the database-server, transported over the network (which is worse if your database server is on a different physical server from your application-server), then unserialized by the database-client, and then subsequently you have to traverse all that data in PHP (or some of it, but all of it has to go into memory) and weed out the non-unique customer records.
Does that sound efficient?
Much for efficient and simple would be a batch of two queries that select the 50 orders and 3 customers separately.
Do you think that's possible?
Posted 12 May 2012 - 07:08 AM
Posted 12 May 2012 - 07:15 AM
As for the data duplication: I do not think this can be entirely avoided. But it sounds like you would profit from identity maps.
Posted 13 May 2012 - 10:33 AM
With the correct APIs, this could be avoided - you could have one round-trip (eager loading) but no data-duplication. Unfortunately, the multi-query facility does not seem to be the answer, as you can't predict what else you need to load without another round-trip - since the queries are independent queries with individual result-sets. At least not as far as I can figure.
This brings me back to what I've been thinking for some time now - that DBMS APIs really are designed for database administrators (or DBA tools like graphical front-ends) and not for software. Where are the APIs that were designed for software? Mainstream DMBS do not have any... odd right? being that DBMS are much more frequently used for back-end persistence for software, than for DBA front-end tools...
Posted 15 May 2012 - 12:21 PM
Unless you use the "together" flag on a with query:
So, true, it takes 2 or maybe 3 queries to get your results instead of the one you're talking about, but there is no duplication of data, and you're avoiding the n query problem.
Posted 17 May 2012 - 09:27 PM
The waiting is pointless - you already know your data-access pattern in advance; unfortunately DBMS, their APIs, and SQL at large, were not designed for software, but for database administrators and DBA tools. Actually I saw somebody else pointing that out in an article today:
SQL in its natural habitat
Things like SQL injection attacks simply should not exist. They’re a consequence of thinking of your database API as a programming language instead of a protocol, and it’s just nuts that vulnerabilities still result from this poorly thought out 1970s design today.
Although I thought this article was largely garbage, this particular observation is one I've been pointing out a lot lately. Your ORM is actually a code-generator. What's worse, your ORM generates SQL, and your DBMS then parses that code and compiles it into an "execution plan", which is really a program. But the truth of the matter is, the entire process is really pointless, because your software does not sporadically come up with new queries on it's own - your applications have predictable data-access patterns, with the possible exception of a few applications that allow arbitrary query-construction. Applications such as DBA tools. Heh.
Posted 18 May 2012 - 03:31 AM