SELECT and GROUP BY clause

Hello,

I have an issue with a select query in Yii. In need to select some duplicates entries in my table, and because of the GROUP BY clause I have some problems. I need to select more fields than the GROUP By uses and if I put in select I get "field_name must appear in the GROUP BY clause or be used in an aggregate function". So, I tried a lot of things and I finished with :

$criteria->select = "url, expire_datetime, start_datetime, discount, providername, COUNT(*) AS nr,

							min(id) AS id,


							min(offer_id) AS offer_id,


							max(title) AS title,


							min(categories) AS categories,


							min(couponid) AS couponid,


							min(status) AS status,


							min(link_status) AS link_status,


							min(modified_datetime) AS modified_datetime,


							min(updated_at) AS updated_at,


							min(type) AS type";


		$criteria->order = "providername ASC";


		$criteria->group = "providername, url, discount, expire_datetime, start_datetime";


		$criteria->having = "COUNT(*) > 1";

It is working , but I need it is not a good solution this. If you can give me a hint on how to transform this. Also is time consuming using an aggregate function as min.

Thank you,

Denisa

In my opinion this kind of queries are better off being written

as raw sql e.g:


$duplicateData=Yii::app()->db->createCommand('

 SELECT  url, expire_datetime, start_datetime, discount, providername, COUNT(*) AS nr,

min(id) AS id,

min(offer_id) AS offer_id,

max(title) AS title,

min(categories) AS categories,

min(couponid) AS couponid,

min(status) AS status,

min(link_status) AS link_status,

min(modified_datetime) AS modified_datetime,

min(updated_at) AS updated_at,

min(type) AS type;

FROM mytable

GROUP BY providername, url, discount, expire_datetime, start_datetime

ORDER BY providername ASC

HAVING COUNT(*) > 1

')->queryAll();



(please replace mytable with the suitable table name for real usage)

Maybe you were expecting something using the Active Record? This would also be a bit

complicated and I don’t think for this type of queries deserves the effort to transform

it to AR. I may be mistaken though.

Thank you for posting. I don’t see any difference between our queries. I wanted an idea about how to re-write my query. If I will use the same thing with min() function, it will not change anything. Also, I need to use the CDbCriteria object because I need to send the results to CGridView which uses CDbCriteria.

Maybe another idea will come :)

Thanks,

I agree with Vassilis with respects that these kinds of queries are better off written directly.

you could use a CSqlDataProvider which would let you use the CGridView.

Unless you have a massive amount of records there is nothing wrong with using aggregate functions.

If you find your queries running slow I would look into adding some indexes to your database

How can I use CSqlDataProvider in my case? Sorry, I didn’t use it until now. I have a lot of records and I already have indexes but I seems to me the query is slow.

Thanks,

You could try having the values stored as pointers in a different table and just use that instead of regenerating it every time you query.

Hi,

I ended up with the following query:

$duplicateData = Yii::app()->db->createCommand(’

		 SELECT  providername, url, discount, expire_datetime, start_datetime, COUNT(*) AS nr


				FROM {{my_table}}


				GROUP BY providername, url, discount, expire_datetime, start_datetime


				HAVING COUNT(*) > 1')->queryAll();

But I need now to for each of this results to take the duplicates :

$criteria->select = "t.*";

$criteria->condition = ‘providername = :providername AND url = :url AND discount = :discount AND start_datetime = :start_datetime AND expire_datetime=:expire_datetime’;

foreach ($duplicateData as $offer) {

$criteria_params = array( ‘:providername’ => $offer[‘providername’], ‘:url’ => $offer[‘url’], ‘:discount’ => $offer[‘discount’], ‘:start_datetime’ => $offer[‘start_datetime’], ‘:expire_datetime’ => $offer[‘expire_datetime’]);

$criteria->params = $criteria_params;

}

But the problem it is, as you can see, that the criteria is changing each time and at the end it has only the last values for the last $offer. Is there any way to add in criteria params all the conditions with “OR” between them? I am just stuck here. I tried with separated arrays but it didn’t work because it need string instead of array. I just don’t have any idea.

Please give me a point to start.

Thanks,

What are you trying to accomplish with the second query?

In the first query I take the duplicates records grouped by some fields. In the second query I want for each of the duplicate to get exactly the duplicates. I need these to displayed in my CGridView.

I also tried with CArrayDataProvider to send an array but I always get an error "Undefined offset: 0".

I tried with array_values() but no success so far.




$duplicateData = Yii::app()->db->createCommand()

		->select('*, COUNT(*) as nr')

		->from('my_table')

		->group('[group by all of the columns in your table except the id field]')

		->having('nr > 1')

		->queryAll();


$dataProvider = new CArrayDataProvider($duplicateData, array(

			'pagination' => array(

				'pageSize' => 20,

			),

		));




$this->widget('zii.widgets.grid.CGridView', array(

	//if you primary key field is not named id you need to specify it here otherwise you can exclude 'keyField'

	'keyField'	 => '[column name of your primary key if it is not named "id"]',

	'dataProvider' => $dataProvider,

	//add your columns here 

	'columns'	  => array(

		// add an array for each column you want displayed

		array(

			'name'   => '[name of column]',

			'header' => '[name you want the column header to be]'

		),

	)

));



This should work

Thanks,

But still not work…I get

CDbCommand failed to execute the SQL statement: SQLSTATE[42803]: Grouping error: 7 ERROR: column "tbl_affiliate_link.id" must appear in the GROUP BY clause or be used in an aggregate function

LINE 1: SELECT *

^. The SQL statement executed was: SELECT *

FROM "my_table"

GROUP BY "providername", "url", "discount", "end_datetime", "start_datetime", "link_status", "offer_id", "modified_datetime"

HAVING count(*) > 1

Sorry, my mistake.

Add the id column first in your group by list.

It is not what I want :) I don’t want to group by id those records. Duplicates for me means to group by exactly some specific fields and nothing more.

I have done with different queries, but it seems that is time consuming… :(


$offers_duplicate = array();

		$duplicateData = Yii::app()->db->createCommand('

			 SELECT url, end_datetime, start_datetime, discount, providername

					FROM {{affiliate_link}}

					GROUP BY providername, url, discount, end_datetime, start_datetime

					HAVING COUNT(*) > 1 LIMIT 10')->queryAll();


		foreach($duplicateData as $offer) {

			$query = "SELECT * FROM {{affiliate_link}}

					WHERE url = '".$offer['url']."'

					AND providername = '".pg_escape_string($offer['providername'])."'

					AND discount = '".$offer['discount']."'

					AND start_datetime = '".$offer['start_datetime']."'

					AND end_datetime = '".$offer['end_datetime']."'";

			$queries = Yii::app()->db->createCommand($query)->queryAll();


			foreach($queries as $do)

				// get the offers which correspond to the selected $offer

				$offers_duplicate[] = $do;

		}


		$dataProvider = new CArrayDataProvider($offers_duplicate, array(

			'id'=>'providername',

			'sort'=>array(

				'attributes'=>array(

					'id', 'type', 'link_status', 'providername', 'url', 'start_datetime', 'end_datetime'

				),

			),

			'pagination'=>array(

				'pageSize'=>30,

			),

		));

		return $dataProvider;

$duplicateData without that LIMIT 10 breaks…because i think everything uses too many resources. So…i still need a fix

Sorry again. I went and did some querying and grouping by the primary key obviously will not work as you will get no duplicates (or at least you shouldn’t). So now I understand what your doing.

This is going to be an expensive operation because your are going to have to run a query for each of the results that come back on your first query where your find the duplicate results.

Yes, right.

Do you have any idea?:)

You will probably have to increase the memory for php or increase the time out length or both.

place - set_time_limit(200); - in your controller

I meant an idea about how to change the queries. I don’t think setting a bigger value for set_time_limit is the right solution.

Thanks,

Usually if you are running queries to find duplicate data it’s for maintenance and if it takes a while that’s okay.

The real question is… why do you have so much duplicate data?

I have some crawlers in the site and it’s just happens to “come” a lot of duplicate data

In the above example the keyField needs to be defined in the dataProvider, not in the widget:




$dataProvider = new CArrayDataProvider($duplicateData, array(

      // move keyField here

      'keyField'       => '[column name of your primary key if it is not named "id"]',  

      'pagination' => array(

            'pageSize' => 20,

       ),

));




$this->widget('zii.widgets.grid.CGridView', array(

      // removed keyField

      'dataProvider' => $dataProvider,

      //add your columns here 

      'columns'         => array(

          // add an array for each column you want displayed

          array(

              'name'   => '[name of column]',

              'header' => '[name you want the column header to be]'

          ),

      )

));