Caching Web Services with APC
Everybody loves web services. It seems like the perfect architecture, back-end (database) stuff is handled independently with front end servers communicating with the back-end via HTTP. The benefits of the RESTful way of building web apps are well documented elsewhere. The is one probably with restful architecture, especially is you are using php. There are to major consequences of this, the first most obvious one is that when you are making web service calls your app cannot be any faster than the web service. If the web service is slow your site is slow.
There is another consequence that you won’t notice unless you get a lot of traffic. If your server is getting three requests per second or more you will notice that your cpu is suddenly pegged. For most people, when they get this kind of traffic they just buy more hardware, but if your site depends on web services they may be the culprit for all the cpu load.
It turns out that web service requests are actually very expensive in terms of cpu. If you use curl you typically use it synchronously, you make a request, wait for the response and then process the data received. This doesn’t sound difficult, but the trouble is that your os is not smart enough to notice that the thread executing php is just waiting, it can’t reclaim that cpu for other uses. So even though the php curl operations aren’t that much work for the processor they can quickly overload it.
There is an easy solution of course, and that is caching. On the server most developers know about caching pages. Unfortunately a lot of modern applications deliver a custom experience for the user that cannot be cached. But your web service calls can. Say you are making some sort of custom news page that aggregates a user’s favorite news and flickr feeds. You cannot cache the html output, because every user is different. You can however cache the rss feeds you are using to build the page.
This only works if the web service calls aren’t all different, which is why the rss aggregator is a good example. I assume that many of the feeds would be popular with lots of different users. My approach is to generate an md5 hash of the feed url and use that as an id for your cache, that way any time a user requests the same feed you know it and can serve it from cache rather than hitting the web service again. If you are on a low-traffic site or shared hosting plan the best way to do this is to just cache in the file system the same way you would cache a dynamic html page, just make sure the unique identifier for the file is that hash, and that you expire the cache often enough that the content is fresh, but not so often that you get no benefit from caching. If you have a high traffic site you already know that caching to disk is insanely expensive (php file_exists() is very expensive), so you will want to cache in memory. The good news is that APC (Alternative Php Cache) exists and is awesome. It is not as awesome as memcached (which is replicated), but it is what memcached is based on and for many cases it will be just as effective. When I just recently used this technique I actually did all my processing on the web service response before caching a serialized php object so I could save the time of doing that on every request.
The performance benefits can be amazing. And this is way easier to implement than memcached.