In this article I’m using the term “full page caching” to broadly represent the idea of saving a generated page into something other than PHP so it can simply be served to the browser rather than interpreted and run first.
Within the context of Craft (or most other content management systems) this can drastically reduce the amount of time it takes a user to get the page, as well as help a site scale to take a large number of concurrent users.
Over the past couple of years a number of different approaches have been made that aim to help this approach work with a site backed by Craft, I’ll go over each of them next and then present some thoughts about where this could be taken in the future.
First up, generating static HTML files. Using the HTML Cache plugin by CraftAPI allows you to save a complete HTML version of any non-cp GET request to a file on disk. Then on each request the plugin checks to see if a cached version exists and serves that if it does. It’s pretty simple and effective, but has a couple of drawbacks:
- It busts everything when you save an Entry.
- PHP and Craft still have to run to determine if there is a flat file to load or not.
The issue of busting the file cache could probably be solved, but it would be tricky, however the main downside that stops this approach for me is that PHP and Craft still have to run. This means that a database connection is needed and all of the other resources that a required to handle a single request. It will speed up page loads, but possibly not much more than simply using the core template caching and wrapping it around everything.
UPDATE: After writing this article Ryan from Lewis Communications got in touch to share with me their plugin, Presto, which improves on HTML Cache by only busting the entire cache for new Entries. It does this by piggy backing of the core template caching, similar to CacheMonster which I discuss later on. Check it out if you want flat files and happy medium cache-busting solution.
This approach uses a native module baked into Nginx that stores the request response on disk and comes with some decent options for purging stale content. If you’ve not already read it, then go read this article from Andew Welch that goes over it all in detail. In fact, it covers loads more including why you might need full page caching at all, so if your a bit confused at this point take a moment to digest his wisdom first, it’s worth it.
For most sites turning on FastCGI Cache will be plenty enough, it’ll enable your 2GB VPS to handle a lot more traffic and speed up page loads dramatically for your users. But, purging that content is tricky. Andrew wrote a plugin for this that does a hard reset whenever something is saved and just dumps the lot, again, this is fine for a lot of sites but not if you have thousands of concurrent users constants sat there.
I’ve thought extensively about various ways of having more fine-grained purging control with FastCGI but keep coming up against the following fundamental problem: you have to delete cached items by knowing their key, which is essentially the URL of the request plus some other bits.
Alongside side that you would need to know each element ID that was used on a given URL, this can be solved (more on this later) but even if we did know that, we’ve have to remove each file in turn. If you have a highly dynamic site this could bottleneck quite quickly. You might have thousands of articles, add one new one and need to purge a lot of paginated list pages.
Even if you dealt with some of the edge cases (e.g. queuing up file deletions to they happened in batches) its still quite a lot of work for the server if the number of URLs per purge gets large, and Nginx doesn’t let you purge via a wildcard (e.g.
Another consideration is horizontal scaling, once the concurrency gets high enough, or if the site requires dynamic failover then I’m not sure a FastCGI cache would scale well. You’d need to have multiple instances running your Craft site behind Nginx, then on save have each instance remove the right file, if it exists.
From a brief google I gather that you can store the actual files in a shared Redis cache and have multiple Nginx instances use that, but its still not completely ideal given we’d have to keep a key/value store of each element ID vs FastCGI cache key. This could be done, and would greatly improve things but, what if all we needed to know was the element ID? Wouldn’t that be better? Then we could simply pass that to the cache layer and have it purge everything that depends on that element … read on!
Here we go, I’m banging on about Varnish again … but really, its very cool. Let me convince you.
With Varnish you can install an extension called xkey that lets you set extra keys for your caches, that means we can send a header with our element IDs in it (!). We can then parse that in our VCL file for PURGE requests and send a request that looks like this to purge all items from our cache that depend on element ID
# Request PURGE / HTTP/1.1 HOST: myreallyfastsite.com xkey-purge: el:123 # Response 200 Invalidated 3 objects
Now we could get really fancy with this and send other keys along with our requests, for example on our news listing page we could send
section:news as well. This would enable us to purge with that key too, which could be when we add a new news article where we might want to purge the homepage and a news index page. This, it turns out, is quite a similar technique to what is currently possible in Drupal 8.
Currently in the Craft+Varnish landscape there are a few plugin options for purging, but none of them implement this alternative approach.
The one I initially wrote (CacheMonster) piggy-backed off the core template caching engine to get all the goodness of caching element queries, this did not scale well however. It does work, but can sometimes take a while for the web server to finish purging all the URLs, much in the same way it can take a while for the template cache tasks to run. You can read about the methodology here and its worth noting that it now also supports FastCGI and CloudFlare thanks to Naveed Ziarab and Solomon Hawk.
There is also Varnish Purge by André Elvan, who knows a thing or two about this topic. His plugin has got pretty fancy and will purge all elements related to the one you’re saving, handle multiple locales and allow you to have an extra lookup map of URLs for anything that doesn’t fit. It’s probably the best option out there currently if you’re set on using Varnish. However, this approach is still URL based.
Conclusion … yet another plugin!
So yes, I’m in the process of writing another plugin … this one will dynamically collect all of your element IDs used on a given request and allow you to tag things arbitrarily in Twig. These all get added to the
xkey header like so:
xkey: el:123 section:news category:44, and so far the element IDs get purged on save.
This should cover all the gaps I’m running up against in the other implementations, I’ll let you know when I’ve got it running in production if it ever makes it that far. For now, you can have a poke at it here: https://github.com/joshangell/Falcon. I called it Falcon because its a very fast bird.
Varnish configuration is hard. If you don’t need to purge your content in a granular fashion whilst under high concurrency I would steer clear. Stick to FastCGI caching, its simpler and very reliable.
I have plans to expand Falcon to include support for Fastly, which is like a managed Varnish with a load of other benefits and would be much simpler to get going with though is more expensive than just running on AWS. Equally it would be more than possible to tie into Varnish Plus, which is similar.
What are your thoughts? Is there a different way of achieving full page caching I’ve not covered? Tell me, I want to know!