120 million rows / Iterator needed?

Hi,

I am new to this forum. First of all I want to say that Yii is THE BEST Framework I have ever seen…

At the moment I have a little problem. I have a database table with more than 120 million lines.

I have made a model, and now I try to get all rows:


$logger = Logger::model()->findAll();

Than I use a loop to write every row in a file.

It works but it needs very much RAM and is very slow.

I think I need something like a cursor/iterator to read the rows in pieces.

I know something like this will be possible in Yii 2, but is there a way to make this possible in Yii 1, too?

And if so, would it solve my problem (first of all: would it need less memory)?


foreach(Logger::model():findAll() as $logentry)

Would be nice if someone could help me. I can’t cache 120 million rows in an array on my server with 1 Gb of RAM.

I need to load the data in pieces….

queryAll will load all data to ram , you should use cursor , read CDbCommand::query and the guide : dao carefully :lol:

In addition yiqing95’s link. I would recommend going the SPL route to speed up your iteration.

Do you really need to access 120 million rows of uncached, unprocessed data on demand???

If you are doing some kind of statistical analysis on this information wouldn’t it be better to process stuff at 3am via cron and store results in an intermediate table for retrieval?

  1. You fetch bulk data in database so your system is slow.

  2. then you using limit in your query.

For fast speed you should try this one http://dev.mysql.com/doc/refman/5.1/en/select-into.html if your DB is mysql of course.

It doesn’t matter how many rows you have in the database as Yii will never pull them at once.

Use paging.

Ok, Yii doesn’t pull all the data at once, but if I save the data into an array this array will allocate all my RAM.

So is there any possibility to work with all 120 million rows in a loop without saving them into an arrays first…

My Yii-App is started with a cronjob every day at 4am. Than it has to walk through all 120 million lines and to save things… This works but at the moment I write all rows into an array and walk through the array. The array is the problem in my case… It’s not good to have a multidimensional array with 120 million rows…

Should I use an iterator/cursor to walk through my rows in the database? Will this solve my problem and is there something like this?

I have found this:

Iterator support

It sounds like the same problem as I have… Should I give this a try or exists any better solution?

Thanks for your help!

Is there a reason you have to pull the data into php?

Any way to do this all via sql directly instead of pulling it into php and manipulating the data?

No, I need to work with all data from this 120 million rows in PHP in a loop…

No where-clause, no group-clause and nothing other possible to reduce data rows…

Yeah, you will get a master crash if you do what you describe… ;)

Do it differently.

I am doing something similar in my app, but I process 100 rows and then exit.

The next run, I process the next 100 rows, and so on.

It works great.

I am using the mutex extension to make sure a cron job doesn’t fire while a job is busy already.

You already got the answer to your problem in the second post but probably you did not understand it…

You need to use a CDbDataReader for this - http://www.yiiframework.com/doc/api/1.1/CDbDataReader

And that is what queryAll() returns - http://www.yiiframework.com/doc/api/1.1/CDbCommand#queryAll-detail

return true;

Exit what? The app?

I want to process maybe about 100 rows, than select the next 100 rows and process them and so on…

I think this should be possible with this iterator feature, query that query builder generates should look like this:

  1. SELECT * FROM logger LIMIT 0, 100;

  2. SELECT * FROM logger LIMIT 101, 200;

So I wouldn’t have any problem. If this feature doesn’t do this correctly, I will make my own feature that makes this and release it under an open source license.

At this point I have a question: I have read that Yii 2 will have iterator support integrated into active record… What does Yii 2 in the background if I make a


foreach (Logger::find() as $logentry) 

and Logger::find() will find about 120 million rows? :confused:

Does the iterator load everything into an array in the background and end in a "master crash"?

Quit being sarcastic.

It’s not my fault you don’t get it.

:lol:

we are not in same channel !

@mdomba maybe only you understand me :lol:

do not use queryAll when there are too much data . it will load all data into ram . use stream/cursor (do you know the “cursor” in sql , if don’t hear about that please google it ) , the CDbDataReader support it(mimic way) .

Yes, I wrote it wrong above, but explained it right :D… queryAll() returns an array of all the data… query() - http://www.yiiframework.com/doc/api/1.1/CDbCommand#query-detail

on the other hand returns a stream of data - CDbDataReader()