Yii Framework Forum: SELECT and GROUP BY clause - Yii Framework Forum

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

SELECT and GROUP BY clause Rate Topic: -----

#1 User is offline   denisa84 

  • Newbie
  • Yii
  • Group: Members
  • Posts: 19
  • Joined: 21-January 12

Posted 03 February 2012 - 09:04 AM

Hello,
I have an issue with a select query in Yii. In need to select some duplicates entries in my table, and because of the GROUP BY clause I have some problems. I need to select more fields than the GROUP By uses and if I put in select I get "field_name must appear in the GROUP BY clause or be used in an aggregate function". So, I tried a lot of things and I finished with :
$criteria->select = "url, expire_datetime, start_datetime, discount, providername, COUNT(*) AS nr,
min(id) AS id,
min(offer_id) AS offer_id,
max(title) AS title,
min(categories) AS categories,
min(couponid) AS couponid,
min(status) AS status,
min(link_status) AS link_status,
min(modified_datetime) AS modified_datetime,
min(updated_at) AS updated_at,
min(type) AS type";
$criteria->order = "providername ASC";
$criteria->group = "providername, url, discount, expire_datetime, start_datetime";
$criteria->having = "COUNT(*) > 1";
It is working , but I need it is not a good solution this. If you can give me a hint on how to transform this. Also is time consuming using an aggregate function as min.
Thank you,
Denisa
0

#2 User is offline   Vassilis 

  • Newbie
  • Yii
  • Group: Members
  • Posts: 6
  • Joined: 30-June 11

Posted 03 February 2012 - 11:20 AM

In my opinion this kind of queries are better off being written
as raw sql e.g:

$duplicateData=Yii::app()->db->createCommand('
 SELECT  url, expire_datetime, start_datetime, discount, providername, COUNT(*) AS nr,
min(id) AS id,
min(offer_id) AS offer_id,
max(title) AS title,
min(categories) AS categories,
min(couponid) AS couponid,
min(status) AS status,
min(link_status) AS link_status,
min(modified_datetime) AS modified_datetime,
min(updated_at) AS updated_at,
min(type) AS type;
FROM mytable
GROUP BY providername, url, discount, expire_datetime, start_datetime
ORDER BY providername ASC
HAVING COUNT(*) > 1
')->queryAll();


(please replace mytable with the suitable table name for real usage)

Maybe you were expecting something using the Active Record? This would also be a bit
complicated and I don't think for this type of queries deserves the effort to transform
it to AR. I may be mistaken though.
0

#3 User is offline   denisa84 

  • Newbie
  • Yii
  • Group: Members
  • Posts: 19
  • Joined: 21-January 12

Posted 03 February 2012 - 11:28 AM

Thank you for posting. I don't see any difference between our queries. I wanted an idea about how to re-write my query. If I will use the same thing with min() function, it will not change anything. Also, I need to use the CDbCriteria object because I need to send the results to CGridView which uses CDbCriteria.
Maybe another idea will come :)

Thanks,
0

#4 User is offline   jpm 

  • Junior Member
  • Pip
  • Yii
  • Group: Members
  • Posts: 26
  • Joined: 17-April 11
  • Location:Austin TX

Posted 03 February 2012 - 08:21 PM

View Postdenisa84, on 03 February 2012 - 11:28 AM, said:

Thank you for posting. I don't see any difference between our queries. I wanted an idea about how to re-write my query. If I will use the same thing with min() function, it will not change anything. Also, I need to use the CDbCriteria object because I need to send the results to CGridView which uses CDbCriteria.
Maybe another idea will come :)

Thanks,


I agree with Vassilis with respects that these kinds of queries are better off written directly.

you could use a CSqlDataProvider which would let you use the CGridView.

Unless you have a massive amount of records there is nothing wrong with using aggregate functions.

If you find your queries running slow I would look into adding some indexes to your database
0

#5 User is offline   denisa84 

  • Newbie
  • Yii
  • Group: Members
  • Posts: 19
  • Joined: 21-January 12

Posted 06 February 2012 - 04:32 AM

View Postjpm, on 03 February 2012 - 08:21 PM, said:

I agree with Vassilis with respects that these kinds of queries are better off written directly.

you could use a CSqlDataProvider which would let you use the CGridView.

Unless you have a massive amount of records there is nothing wrong with using aggregate functions.

If you find your queries running slow I would look into adding some indexes to your database

How can I use CSqlDataProvider in my case? Sorry, I didn't use it until now. I have a lot of records and I already have indexes but I seems to me the query is slow.
Thanks,
0

#6 User is offline   Mike Chojnacki 

  • Newbie
  • Yii
  • Group: Members
  • Posts: 15
  • Joined: 31-January 12

Posted 06 February 2012 - 05:33 AM

You could try having the values stored as pointers in a different table and just use that instead of regenerating it every time you query.
0

#7 User is offline   denisa84 

  • Newbie
  • Yii
  • Group: Members
  • Posts: 19
  • Joined: 21-January 12

Posted 06 February 2012 - 08:53 AM

Hi,
I ended up with the following query:
$duplicateData = Yii::app()->db->createCommand('
SELECT providername, url, discount, expire_datetime, start_datetime, COUNT(*) AS nr
FROM {{my_table}}
GROUP BY providername, url, discount, expire_datetime, start_datetime
HAVING COUNT(*) > 1')->queryAll();
But I need now to for each of this results to take the duplicates :
$criteria->select = "t.*";
$criteria->condition = 'providername = :providername AND url = :url AND discount = :discount AND start_datetime = :start_datetime AND expire_datetime=:expire_datetime';
foreach ($duplicateData as $offer) {
$criteria_params = array( ':providername' => $offer['providername'], ':url' => $offer['url'], ':discount' => $offer['discount'], ':start_datetime' => $offer['start_datetime'], ':expire_datetime' => $offer['expire_datetime']);
$criteria->params = $criteria_params;
}

But the problem it is, as you can see, that the criteria is changing each time and at the end it has only the last values for the last $offer. Is there any way to add in criteria params all the conditions with "OR" between them? I am just stuck here. I tried with separated arrays but it didn't work because it need string instead of array. I just don't have any idea.
Please give me a point to start.
Thanks,
0

#8 User is offline   jpm 

  • Junior Member
  • Pip
  • Yii
  • Group: Members
  • Posts: 26
  • Joined: 17-April 11
  • Location:Austin TX

Posted 06 February 2012 - 11:23 PM

What are you trying to accomplish with the second query?
0

#9 User is offline   denisa84 

  • Newbie
  • Yii
  • Group: Members
  • Posts: 19
  • Joined: 21-January 12

Posted 07 February 2012 - 04:46 AM

View Postjpm, on 06 February 2012 - 11:23 PM, said:

What are you trying to accomplish with the second query?

In the first query I take the duplicates records grouped by some fields. In the second query I want for each of the duplicate to get exactly the duplicates. I need these to displayed in my CGridView.
I also tried with CArrayDataProvider to send an array but I always get an error "Undefined offset: 0".
I tried with array_values() but no success so far.
0

#10 User is offline   jpm 

  • Junior Member
  • Pip
  • Yii
  • Group: Members
  • Posts: 26
  • Joined: 17-April 11
  • Location:Austin TX

Posted 07 February 2012 - 09:28 AM

$duplicateData = Yii::app()->db->createCommand()
		->select('*, COUNT(*) as nr')
		->from('my_table')
		->group('[group by all of the columns in your table except the id field]')
		->having('nr > 1')
		->queryAll();

$dataProvider = new CArrayDataProvider($duplicateData, array(
			'pagination' => array(
				'pageSize' => 20,
			),
		));


$this->widget('zii.widgets.grid.CGridView', array(
	//if you primary key field is not named id you need to specify it here otherwise you can exclude 'keyField'
	'keyField'	 => '[column name of your primary key if it is not named "id"]',
	'dataProvider' => $dataProvider,
	//add your columns here 
	'columns'	  => array(
		// add an array for each column you want displayed
		array(
			'name'   => '[name of column]',
			'header' => '[name you want the column header to be]'
		),
	)
));


This should work
0

#11 User is offline   denisa84 

  • Newbie
  • Yii
  • Group: Members
  • Posts: 19
  • Joined: 21-January 12

Posted 07 February 2012 - 09:54 AM

View Postjpm, on 07 February 2012 - 09:28 AM, said:

$duplicateData = Yii::app()->db->createCommand()
		->select('*, COUNT(*) as nr')
		->from('my_table')
		->group('[group by all of the columns in your table except the id field]')
		->having('nr > 1')
		->queryAll();

$dataProvider = new CArrayDataProvider($duplicateData, array(
			'pagination' => array(
				'pageSize' => 20,
			),
		));


$this->widget('zii.widgets.grid.CGridView', array(
	//if you primary key field is not named id you need to specify it here otherwise you can exclude 'keyField'
	'keyField'	 => '[column name of your primary key if it is not named "id"]',
	'dataProvider' => $dataProvider,
	//add your columns here 
	'columns'	  => array(
		// add an array for each column you want displayed
		array(
			'name'   => '[name of column]',
			'header' => '[name you want the column header to be]'
		),
	)
));


This should work


Thanks,
But still not work..I get
CDbCommand failed to execute the SQL statement: SQLSTATE[42803]: Grouping error: 7 ERROR: column "tbl_affiliate_link.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT *
^. The SQL statement executed was: SELECT *
FROM "my_table"
GROUP BY "providername", "url", "discount", "end_datetime", "start_datetime", "link_status", "offer_id", "modified_datetime"
HAVING count(*) > 1
0

#12 User is offline   jpm 

  • Junior Member
  • Pip
  • Yii
  • Group: Members
  • Posts: 26
  • Joined: 17-April 11
  • Location:Austin TX

Posted 07 February 2012 - 10:06 AM

Sorry, my mistake.

Add the id column first in your group by list.
0

#13 User is offline   denisa84 

  • Newbie
  • Yii
  • Group: Members
  • Posts: 19
  • Joined: 21-January 12

Posted 07 February 2012 - 10:13 AM

View Postjpm, on 07 February 2012 - 10:06 AM, said:

Sorry, my mistake.

Add the id column first in your group by list.


It is not what I want :) I don't want to group by id those records. Duplicates for me means to group by exactly some specific fields and nothing more.
I have done with different queries, but it seems that is time consuming.. :(
$offers_duplicate = array();
		$duplicateData = Yii::app()->db->createCommand('
			 SELECT url, end_datetime, start_datetime, discount, providername
					FROM {{affiliate_link}}
					GROUP BY providername, url, discount, end_datetime, start_datetime
					HAVING COUNT(*) > 1 LIMIT 10')->queryAll();

		foreach($duplicateData as $offer) {
			$query = "SELECT * FROM {{affiliate_link}}
					WHERE url = '".$offer['url']."'
					AND providername = '".pg_escape_string($offer['providername'])."'
					AND discount = '".$offer['discount']."'
					AND start_datetime = '".$offer['start_datetime']."'
					AND end_datetime = '".$offer['end_datetime']."'";
			$queries = Yii::app()->db->createCommand($query)->queryAll();

			foreach($queries as $do)
				// get the offers which correspond to the selected $offer
				$offers_duplicate[] = $do;
		}

		$dataProvider = new CArrayDataProvider($offers_duplicate, array(
			'id'=>'providername',
			'sort'=>array(
				'attributes'=>array(
					'id', 'type', 'link_status', 'providername', 'url', 'start_datetime', 'end_datetime'
				),
			),
			'pagination'=>array(
				'pageSize'=>30,
			),
		));
		return $dataProvider;


$duplicateData without that LIMIT 10 breaks...because i think everything uses too many resources. So..i still need a fix
0

#14 User is offline   jpm 

  • Junior Member
  • Pip
  • Yii
  • Group: Members
  • Posts: 26
  • Joined: 17-April 11
  • Location:Austin TX

Posted 07 February 2012 - 10:17 AM

Sorry again. I went and did some querying and grouping by the primary key obviously will not work as you will get no duplicates (or at least you shouldn't). So now I understand what your doing.

This is going to be an expensive operation because your are going to have to run a query for each of the results that come back on your first query where your find the duplicate results.
0

#15 User is offline   denisa84 

  • Newbie
  • Yii
  • Group: Members
  • Posts: 19
  • Joined: 21-January 12

Posted 07 February 2012 - 10:18 AM

View Postjpm, on 07 February 2012 - 10:17 AM, said:

Sorry again. I went and did some querying and grouping by the primary key obviously will not work as you will get no duplicates (or at least you shouldn't). So now I understand what your doing.

This is going to be an expensive operation because your are going to have to run a query for each of the results that come back on your first query where your find the duplicate results.

Yes, right.
Do you have any idea?:)
0

#16 User is offline   jpm 

  • Junior Member
  • Pip
  • Yii
  • Group: Members
  • Posts: 26
  • Joined: 17-April 11
  • Location:Austin TX

Posted 07 February 2012 - 01:41 PM

View Postdenisa84, on 07 February 2012 - 10:18 AM, said:

Yes, right.
Do you have any idea?:)


You will probably have to increase the memory for php or increase the time out length or both.


place - set_time_limit(200); - in your controller
0

#17 User is offline   denisa84 

  • Newbie
  • Yii
  • Group: Members
  • Posts: 19
  • Joined: 21-January 12

Posted 07 February 2012 - 01:47 PM

View Postjpm, on 07 February 2012 - 01:41 PM, said:

You will probably have to increase the memory for php or increase the time out length or both.


place - set_time_limit(200); - in your controller

I meant an idea about how to change the queries. I don't think setting a bigger value for set_time_limit is the right solution.
Thanks,
0

#18 User is offline   jpm 

  • Junior Member
  • Pip
  • Yii
  • Group: Members
  • Posts: 26
  • Joined: 17-April 11
  • Location:Austin TX

Posted 07 February 2012 - 02:40 PM

View Postdenisa84, on 07 February 2012 - 01:47 PM, said:

I meant an idea about how to change the queries. I don't think setting a bigger value for set_time_limit is the right solution.
Thanks,


Usually if you are running queries to find duplicate data it's for maintenance and if it takes a while that's okay.

The real question is... why do you have so much duplicate data?
0

#19 User is offline   denisa84 

  • Newbie
  • Yii
  • Group: Members
  • Posts: 19
  • Joined: 21-January 12

Posted 07 February 2012 - 02:43 PM

View Postjpm, on 07 February 2012 - 02:40 PM, said:

Usually if you are running queries to find duplicate data it's for maintenance and if it takes a while that's okay.

The real question is... why do you have so much duplicate data?

I have some crawlers in the site and it's just happens to "come" a lot of duplicate data
0

#20 User is offline   Amy 

  • Junior Member
  • Pip
  • Yii
  • Group: Members
  • Posts: 22
  • Joined: 04-February 13

Posted 21 March 2013 - 03:43 PM

In the above example the keyField needs to be defined in the dataProvider, not in the widget:

$dataProvider = new CArrayDataProvider($duplicateData, array(
      // move keyField here
      'keyField'       => '[column name of your primary key if it is not named "id"]',  
      'pagination' => array(
            'pageSize' => 20,
       ),
));


$this->widget('zii.widgets.grid.CGridView', array(
      // removed keyField
      'dataProvider' => $dataProvider,
      //add your columns here 
      'columns'         => array(
          // add an array for each column you want displayed
          array(
              'name'   => '[name of column]',
              'header' => '[name you want the column header to be]'
          ),
      )
));

0

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users