Full Page Cache split across Disk and Memory

Lilt · November 16, 2011, 4:21pm

Hi There,

I’m presently having an issue on a large site I’ve recently switched over to using Page caching for most of the pages. The site is extremely busy, and the pages are fairly hefty, to the extent that the disk cache for pages has reached 8.4GB in the past. The problem is that when a spider crawls the site, which is hosted on a cloud service, the disk access (which is on a different machine, as it’s a cloud infrastructure) is a major bottleneck.

What I’d like to do is to cache the majority of the response that varies by page on disk, the contents of the view as well as the dynamic headers such as title and metadata, but cache the layout (header and footer) in memory (memcache/apc). Is there an easy way to do this? What are the potential pitfalls to this sort of approach? Is there a better solution?

Cheers,

Da_Sourcerer · November 16, 2011, 5:17pm

You might benefit from http caching, which will prevent clients from polling the same page over and over again. However, it sounds like that won’t work for some of your pages. Maybe you could fall back to using fragment caching instead of full-page caching and use memcache as the backend?

Y11 · November 17, 2011, 9:53am

If possible just push everything into memcache. 2-4gb might already be enough in your case.

Regarding file cache, try to set CFileCache.directoryLevel to 2 or 3. If you have tens of thousand files in one directory it’s better to split those into several sub-directories. Maybe it helps a little.

I don’t think the idea of partial caching is good because you would have more cache-get and set request and you would have to compile the page (meta, layout, content) on each requests instead of just serving the full data from cache.

Lilt · November 24, 2011, 1:15pm

Hi guys,

Thanks for the assistance. Can I ask if there’s a built-in way to disable the cache for particular requests? I have some actions that I don’t want to be cached.

Cheers,

Lilt · November 28, 2011, 3:29pm

Okay, the problem that I’m now having seems to be that actually rendering the page from cache is taking way too long. I’m timing how long the request takes in MS from the entry point. When filters() is run, we’re only 0.01s in. When the first dynamic section of the page is rendered, this has jumped forwards to 0.6 seconds or so. Ideally, we’d like the system to handle over 60 requests per second, so the magic number would be around 0.015 seconds per request.

Essentially, what processing happens after filters() is called but before the dynamic page sections are rendered? Can it be cut, and if so how would I go about doing that?

Lilt · November 28, 2011, 7:05pm

Okay, update: I tried removing the dynamic sections from the page and all of a sudden I’m up to 130-140 requests/second.

As such, I could store some info to go in these dynamic sections in the user’s cookie and simply serve a static page, making this solution potentially good enough, but the ideal solution would be for Yii’s full page cache to reconstruct the page a little faster. Does anybody know why this might happen?

Our server admin who’s been monitoring the site and running benchmarks says the bottleneck is CPU usage rather than disk, database, or memory access. At first glance I thought it was the preg_replace_callback() call, but I took times before and after this using microtime(true) before and after and found it to take a fraction of a millisecond for a 150kB page. Is that right, or is my benchmark invalid? I was thinking about trying to rewrite the COutputCache class to see if I could make it reconstruct faster.

My theory is that either the unserialize is taking a lot longer when there are more elements or that I was right with my initial assumption and the preg_replace_callback is what’s taking the time. I’m not sure how to solve the former, but if I was right and it’s the latter then I’d be best storing the output in a manner that does not require a regular expression across the string to reconstruct (split it into chunks of text first, probably).

Any ideas for which path would be the best to take?