Cache Prepoulation - Getting Around Synchronization Issues with Memcache and CDNs

Error message

The spam filter installed on this site is currently unavailable. Per site policy, we are unable to accept new submissions until that problem is resolved. Please try resubmitting the form in a couple of minutes.

The challenge of working with high volume Drupal sites lies in the relationship between the CDN and the underlying memcache implementation. Sometimes, a vicious circle occurs where the CDN and Drupal just can't get their collective act together, and strategies for prepopulating cached content come into play.

For sites that are frequently updated, the CDN can only offload so much traffic before needing to turn to Drupal to produce more pages. This can be a lot of simulatenous page requests - with the Akamai CDN, there are potentially 15,000 edge servers coming back at the site simultaneously each time content expires for each content item.

Handling this traffic can be a challenge due to the way Drupal caches content. The basic model under which it operates is that content is loaded into cache at the time a page request is received. Drupal only recognizes whether content is in the cache or not - it does not understand if content is being loading or know to wait for it. This means that, potentially, each time 15,000 page requests come in, 15,000 original pages need to be generated all at once unless they are all in cache at the time it is generated. Considering pages in Drupal take roughly 100 times the amount of resources to generate, this cycle reduces the performance of servers and makes scalability difficult.

Trellon recently worked on a site that was experiencing this exact problem, and to solve it we employed a strategy based on prepopulating the cache with content on regular intervals to ensure the latest content was making its way up to the front page at all times. Essentially, what we did was take a second copy of Drupal and modify it to fail all cache checks so that it would always generate new content and put it into cache. We put this version of Drupal on a non-public server and ran cron against it for the most popular pages in the site on 30 second intervals (which you can do with cron, just duplicate all one minute entries and put a sleep 30; command in front of the command to request the page).

This forced cycling of memcached content allowed us to scale the site to a fairly high level. We were able to offload actual page production to a low performance server and leave the production environment set up for minimal page production. Since everyone likes benchmarks, here are a few to chew on:

benchmark start: Feb 15 20:00:10 CST 2008
benchmark end: Feb 15 20:46:51 CST 2008

memcache initial
connections 300000
concurrency 400
time to return pages (ms)

50%
1855

66%
2383

75%
4147

80%
4557

90%
5852

95%
10880

98%
22475

99%
23359

100%
189044

peak db load: 32.01

In just over 45 minutes we were able to crank out 300,000 requests for the front page. This is on a dual processor, dual core 2.8GHz chip machine with 2GB of RAM running against a default installation of postgres (i.e. with no performance tuning). It should be noted that this server was also running a lot of other services, like FTP, email, iptables, etc. and that benchmarks could be substantially improved if we were to actually tune postgres.

This profile scales. We put apache and Drupal on the database server to play around with it and ran simultaneous benchmarks. Here's what we came away with:

benchmark start: Feb 15 19:59:54 CST 2008
benchmark end: Feb 15 21:04:26 CST 2008

memcache initial
connections 300000
concurrency 400
time to return pages (ms)

50%
3556

66%
4171

75%
5467

80%
6206

90%
7352

95%
12491

98%
23334

99%
24935

100%
200037

peak db load: 32.01

The differences in page generation times are attributable mostly to running postgres on the same machine, but the point is the same. 300,000 more page requests in just over an hour. This was run at the same time we ran the other load tests, giving us over 600,000 page requests hourly with minimal hardware.

In a former life, I tried doing this once with Sharepoint, which did not use memcache but did have a memory resident application scope providing similar functionality. I wish I still had a copy of those benchmarks for comparison.

M