Performance Impact Of Behaviors

Not sure where this goes, so posting this as ‘esoteric behavior’.

I’ve been profiling parts of an application to identify areas worthwhile to optimize following an important load on the prod server.

Several optimisations have been done, and here I report the observation that behaviors seem to have an important impact on performance.

Below is the summary of the top consumers in time (‘self’), and most seem ‘behavior’ related:

4196

profile.png

I think a lot of it boils down to CList (add/insertAt and constructor).

So optimizing CList is likely a good contributor to speed.

Any suggestions about how can it be optimized?

That’s my next step - validate some ideas.

It doesn’t seem behavior related!

php_array_filter takes 51ms!

And the JSON call also takes a while.

Then, most of the time is spent rendering HTML, and calling components.

I see that behaviors play a very minor role with regards to performance there.

<edit>

One thing that could be worrying is the db call, but we all know that it’s slow.

Finally next step is provide more details ;-).

When I looked at why ‘getEventHandlers’ and stuff was called, this was due to adding the behaviors.

getEventHandlers is a total self of 1.3 seconds. attachEventHandler a total self of 1.1s and total accumulated of 4.1 seconds. And attachBehavior is a total accumulated of 5.1seconds. That is about 1/3 of the total execution time during this run.

Hence my conclusion that some time can be gained related to the behaviors.

Regarding the queries: I suppose that PDO timing is relevant enough:

4197

pdo.png

I think that is reasonable for +500 queriesn but possibly not the whole number;.

Optimisation 1:




    private static $_alternate=0;

	/**

 	* Appends an item at the end of the list.

 	* @param mixed $item new item

 	* @return integer the zero-based index at which the item is added

 	*/

	public function add($item)

	{

    	self::$_alternate=!self::$_alternate;

    	if(self::$_alternate) {

        	return $this->_orgAdd($item);

    	} else {

        	return $this->_otherAdd($item);

    	}

	}

	private function _orgAdd($item) {

    	$this->insertAt($this->_c,$item);

    	return $this->_c-1;

	}

	private function _otherAdd($item) {

    	array_push($this->_d,$item);

    	return $this->_c++;

    }



The above code makes sure the original ‘add’ code is called in one case, and an alternate ‘add’ code in the other.

4199

optim1.png

The original and the alternative code are both called 8030 times. The original code has 316ms for ‘self’ and ‘357ms’ for ‘total’. The alternate code has ‘176ms for’ self and ‘193ms’ for total. So 45% gain there for self and 51% for the total.

‘add’ itself counts for about 348ms of self and 910ms in total (which includes the calls to orgAdd and otherAdd).

So that is a 350ms gain on the 5 seconds total. That’s “only” 7%, but it’s a start.

Ok, I agree, that was not a lot of gain, so how about this. Step 2 for the same function:





	private static $_alternate=0;

	/**

     * Appends an item at the end of the list.

     * @param mixed $item new item

     * @return integer the zero-based index at which the item is added

     */

	public function add($item)

	{

    	self::$_alternate=!self::$_alternate;

    	if(self::$_alternate) {

        	return $this->_new2Add($item);

    	} else {

        	return $this->_otherAdd($item);

    	}

	}

	private function _new2Add($item) {

    	$this->_d[]=$item;

    	return $this->_c++;

	}

	private function _otherAdd($item) {

    	array_push($this->_d,$item);

    	return $this->_c++;

	}



4200

optim2.png

Only 37 ms for the new function! And 181ms for my previous optimisation. almost 90% of gain for ‘add’ so about 13% versus the total 5seconds for attaching behaviors.

‘add’ is optimised. Time to move on to another time-eater.

P.S.: Using the new add, the total time for add now is 77ms versus 670ms total in the initial profile.

Hi

I have been looking at CComponent::getEvent. This takes about 800ms in total.

it is difficult to improve the code for the same function (I tried several methods).

A major improvement can be achieved if we accept that in production this check is not needed. Just testing ‘YII_DEBUG’ reduces the total time to about 40ms, so a 90% gain. I do not know how you feel about this. It could also be a new global parameter indicating ‘YII_DISABLE_CHECKS’.




	/**

	 * Determines whether an event is defined.

	 * An event is defined if the class has a method named like 'onXXX'.

	 * Note, event name is case-insensitive.

	 * @param string $name the event name

	 * @return boolean whether an event is defined

	 */

	private function _hasEvent($name)

	{

		return !strncasecmp($name,'on',2) && method_exists($this,$name);

	}

	private function _hasEventAlt($name) {

	    return !YII_DEBUG || ((0===strncasecmp($name,'on',2))&&method_exists($this,$name));

	}

	private static $_alternate=0;

	public function hasEvent($name) {

	    self::$_alternate=!self::$_alternate;

	    return self::$_alternate?$this->_hasEvent($name):$this->_hasEventAlt($name);

	} [size=2]	[/size]



That would be an optimisation similar to what is done in ‘raiseEvent’:




		else if(YII_DEBUG && !$this->hasEvent($name))



Other alternatives I tried yield either the same execution time or worse.

Another way would be to avoid one of the checks through a parameter or so.

Anyway, I know where another 800ms go and that they can be optimised only by skipping the test (or by limiting calls to hasEvent).

That leaves us with about 3400ms to analyze.

Another optimisation, locally a minor one, but also impacts ‘hasEvent’.

In the trial below, I always convert the name to lowercase. In most cases this will be needed because you can expect that the event that is looked for exists.

Then I check if the handler already exists in the table. If the handler is in the table, the event exists, doesn’t it?

By doing that I avoid a call to hasEvent, and only if the event does not exist yet, the call to hasEvent is made.




	/**

	 * Returns the list of attached event handlers for an event.

	 * @param string $name the event name

	 * @return CList list of attached event handlers for the event

	 * @throws CException if the event is not defined

	 */

	public function _getEventHandlers($name)

	{

		if($this->hasEvent($name))

		{

			$name=strtolower($name);

			if(!isset($this->_e[$name]))

				$this->_e[$name]=new CList;

			return $this->_e[$name];

		}

		else

			throw new CException(Yii::t('yii','Event "{class}.{event}" is not defined.',

				array('{class}'=>get_class($this), '{event}'=>$name)));

	}


	public function getEventHandlersAlt($name)

	{

	    $name=strtolower($name);

	    if(isset($this->_e[$name])) {

	        return $this->_e[$name];

	    } elseif ($this->hasEvent($name)) {

	        return $this->_e[$name]=new CList;

	    }

	    else

	        throw new CException(Yii::t('yii','Event "{class}.{event}" is not defined.',

	                array('{class}'=>get_class($this), '{event}'=>$name)));

	}


	static $alternate=0;

	

	public function getEventHandlers($name) {

	    self::$alternate=1-self::$alternate;

	    return self::$alternate?$this->getEventHandlersAlt2($name):$this->getEventHandlersAlt($name);    

	}



What’s the gain?

4201

optim3.png

Well, there are only 8670 hasEvent calls versus 16060 calls - a gain of about 400ms.

The self of the function goes down from 412ms to 349ms. That is 15%, only 130ms.

So in all, a 530ms gain. The total cumulated for the getEventHandlers function has gone from 2.4 seconds on the first run to 1.2 seconds on the last run. That’s a nice 1.2 seconds gain on the 5.1 total or 24%. With a potential to gain 400ms more with the debug option.

That is confirmed by comparing the new profile with the original one.

4202

profile2.png

CComponent->attachBehavior has a total accumulated of 2.8 seconds in stead of 5.1 even if that is not entirely comparable because done in different runs.

php::PDO is at the head of the list too now which is preferable.

Next up is a look at attachEventHandler where 665ms are lost, and attachBehavior with a small loss of 168ms. [The arrayDumper seems to be worth a look as well as populateRecord].

Nice work. Always good to see people actually trying to help instead of only complaining. :)

(and no: I won’t self-reflect over that :P)

Did you already try to reduce the number of ActiveRecord instances that get populated? Having more than 1k of them seems a lot to me. Maybe you can use DAO for your scenario? Avoiding all those calls might actually help you more than squeezing the last bit of performance out of them. ;)

There was also an issue/ pull request about an option to allow returning arrays instead of ActiveRecord instances from ActiveRecord::findAll methods. But it seems that feature has been scheduled for v2.0 (ActiveRecord: add support to get result of findAll() and relations as arrays).

A change not critical for me and probably only rare improvement, inside CComponent::__set():




[size=2]		[/size][size=2]elseif(strncasecmp($name,'on',2)===0 && method_exists($this,$name)) {[/size]

		    $iname=strtolower($name);

		    if(isset($this->_e[$iname])) {

		        return $this->_e[$iname]->add($value);

		    } else {

		        return $this->_e[$iname]=new CList(array($name=>$value));

[size=2]		[/size][size=2]    }[/size]



Inside the getter and isset, I don’t think that testing ‘isset’ first is a performance improver (because it would probably result in an extra check in other cases).

I also compared the functions in the next code: using CComponent::__set() to do the job is worse (as we can expect):




	public function _attachEventHandler($name,$handler)

	{

		$this->getEventHandlers($name)->add($handler);

	}

	

	public function _altAttachEventHandler($name,$handler) {

	    $this->$name=$handler;

	}


	private static $a;

	public function attachEventHandler($name,$handler) {

	    self::$a=!self::$a;

	    return self::$a?$this->_attachEventHandler($name, $handler):[size=2]$this->_altAttachEventHandler($name, $handler)[/size]

	}

[size=2]

[/size]

[size=2]

[/size]

[size=2]Then I found this to be slightly faster (20-30ms on 380ms):[/size]

[size=2]


[/size]

	public function _altAttachEventHandler($name,$handler) {

	    $h=$this->getEventHandlers($name);

	    $h->add($handler);

	}

[size=2]

[/size]

[size=2]

[/size]

[size=2]Duplicating getEventHandlers did not help.[/size]

[size=2]

[/size]

[size=2]There is no proposable optimisation I can think of for ‘CBehavior::attach’[/size]

@Ben 1: Thanks for the compliment!

@Ben 2: I will consider DAO when needed - in this case it is not a single big request, but several small requests in a specific algorithm. I am already optimizing the algorithm, but that’s not an excuse not to identify why the framework is slowing down so much.

@Ben 3: Generally when I "complain", I also have the solution. Still, that does not mean that it propagates to the official release.

Probably final suggestion regarding the optimization of behaviors: do "lazy loading" of behaviors.

That is, behaviors should only be loaded when they are needed and in many cases they are not needed.

This can be done in the ‘__get’, ‘__set’, ‘__isset’, enable/disable, … methods. Lazy loading means that the behaviors should not be loaded until there is a ‘getter’ that does not exist.

This is not a minor change and might have sideeffects where behaviors impact the owner when they are attached to it.

[size="4"]Final conclusion:[/size]

I built a new yiilite.php and ran a profile session with debug inactive against that.

4203

final.png

It turns out that:

  1. The optimisations have their effect [CList->add is the best one: it’s only 68ms for >16k calls.

  2. Most of the time goes to the behaviors: setting them up and raising them. I’ve had a quick look at ‘raiseEvent’, but I have no ideas yet.

  3. Even CActiveRecord->query relies on CActiveRecord->populateRecord which implies the delay of setting up the behaviors!

  4. Out of the 9.6 seconds of execution time, still about one third (3.1seconds) goes to "CComponent->attachBehaviors" and children. RaiseEvent uses anther 533ms with only 160ms spent in the handlers.

  5. The behaviors are called upon in less than 250 calls. Compared to the 986 calls to ‘attachBehaviors’ that’s at least 75% attachBehaviors calls too many (when useful, a behavior is called more than once for the same instance). This is why lazy loading would be usefull and save close to 2.3 seconds (about 25% of total time).

  6. PDO construction is about 1 second and queries about 400ms. Actual database access counts for very little.

  7. Time spent in CDbCommand is 607ms in total (6%)

8 )Time spent in CList is 432ms, 58% of the time in the constructor.

Regarding point 3: That’s my whole point. Instead of using




$models = MyModel::findAll( ... );  // which does the query() somewhere inside

foreach ($models as $model) { ... }



which will probably instantiate lots of ActiveRecord instances (and thus creating the overhead with attaching the behaviors), it might be worth doing




$dataReader = Yii::app()->db->createCommand( $sql )->query();

foreach ($dataReader as $row) { ... }



Same about relations: Eager loading could cause lots of instances, even if they won’t be used. Again, with the overhead of all that behaviors being attached.

It’s not so much about whether it is one big request or many small ones, just try to avoid too many AR instances if you don’t need them. Of course it depends on your algorithm, so this isn’t a general solution. But often when I was profiling code, the largest gains couldn’t be made by optimizing algorithms, but by changing design.

@Ben:

I agree that it is possible to gain time "by design", however it goes with reduced support from the framework features which increase development speed and maintainability.

In this particular case I’ld have to duplicate business logic to apply to the elements of an array instead of the fields of a class.

Saying that the solution is DAO is a step towards saying to not use a framework. Using bare PHP is so much faster than using a framework.

Heck, I even know a company that has written its own framework … in C Language! As it is tailor made and compiled, it is superfast and the code is protected - the company can provide the binary on a mobile machine to a customer without worrying that the code (/know-how) gets copied.

So my argument is: I agree that optimisations are possible by changing the design/algortihm and using the framework less, but that’s no excuse to not identify how that can be avoided that by improving the framework. Here I had a clear case on my hands putting the behaviors at the forefront of slowlyness that seemed worthwhile to analyze.

I just looked at the preview of Yii 2. It looks like lazy loading of behavior is in there. Cool!

Yeah, we’ve addressed it there. Still if you can do the same with 1.1 while keeping backwards compatibility it will be just great.

Anything in the 2.0 implementation that is considered "not backwards compatible".

The main limitation I can think of is that if developers inserted actions related to owner data inside the ‘attach’ method, delaying attachment is breaking compatibility. The workaround for that is to keep an option so that the user can choose how attaching works (immediate or delayed).

[Edit: see next message - I probably tested with a wrong version here]

[s]I made some changes to CComponent and guess what?! Behaviors disappeared as a bottleneck.

[/s]

[s]Here is the profile of the top time consumers (self, Yiilite 1.1.12 modified, no debug):

4207

with_newccomponent.png

CActiveRecord::populateRecord has a total time less than PDO::__construct!

Total time dropped from 9.6 seconds in the report without this optimisation to 5.5 seconds. That is 4.1 seconds less !!

[/s]

Notice: the optimisation is “slightly” different than Yii 2.0 which is not entirely ‘lazy loading of behaviors’. When behaviors are attached in group, they are put in an internal list of behaviors waiting to be attached. They are attached only when they might be called upon.

I haven’t applied this to ‘attachBehavior’ because its contract requires that the behavior is returned, but ideally that should work the same for performance (a third parameter could indicate that the return value is required).

This is the updated code (from 1.1.12) 4211

CComponent.php
. I haven’t tested this extensively (I do not have the regression suite), but seems to work for me in my application.

Update:

I must have forgotten to update yiilite in my latest test after forcing attaching behaviors in event related methods.

By doing that, the delay gained by not attaching the behaviors early is lost by attaching them later.

getEventHandlers is still called > 16k times implying attaching the behaviors, new CList creation, etc.

I finally managed to get part of the performance improvement that my changes to CComponent were supposed to add.

I did it by adding the following code to the models that are most used:




    /** Override default method to avoid trying to trigger events (and load behaviors) for performance */

    public function afterFind() {

        

    }

    

    /** Override default method to avoid trying to trigger events (and load behaviors) for performance */

    public function beforeFind() {

        

    }



That helps because ‘afterFind’ and ‘beforeFind’ try to trigger the onBeforeFind and onAfterFind events. To do so, they first check if there are event handlers for these events using ‘hasEventHandler’. As a handler might be present in one of the behaviors for which the CComponent implementation provided in a previous post doe lazy attachment all the behaviors have to be attached. By doing so the benefit of lazy attachment is lost.

As in most models there are no afterFind or beforeFind handlers and as you can be sure of that for a particular application, one can override the afterFind and beforeFind methods in most activerecord models.

After applying that to the most used classes, the total execution time dropped to 6.1 seconds with 0.4 seconds spent in ‘ensureBehaviors’ mainly because of actual attachment of behaviors. CComponent::getEventHandlers is called 1674 times instead of the 16k times which is representative of the impact of the changes.

I haven’t checked if there is still something to learn from this for Yii 2.0. If the ‘default events’ are still there, it seems appropriate to have means to enable them in such a manner that there is very limited impact on performance. These ‘system events’ should be disabled by default in the ActiveRecords, or they should only be active in development (debug) mode and result in an exception whenever they are used without them being enabled.

Further, amend Yii 2 to really apply lazy attachment. Currently lazy attachment is only applied for ‘behaviors()’, not for behaviors attached using ‘attachBehavior*()’. Look at my CComponent implementation above. [I am not making a push request on github for that].

Also, inside behaviors, limit the number of events that the behavior tries to attach to. That might be done by introducing a behavior option which events should be attached. That may or may not be valid for Yii 2 too.