Site Building Hotness with Edge Side Includes

We've covered caching a fair amount recently. We've discussed how front-end caching can be a powerful asset when building sites. As is often the case, though, using a powerful tool effectively takes some getting used to. It also means learning new ways of doing things. Caching is no exception.

When most developers build web apps, they have a model in mind that is so ingrained that they scarcely know it's there. What is it? It can be summed up by a phrase you've heard many times; 'The user requesting this page...' It is the assumption that the page is being built right now for one person who is waiting for their browser right now to show the page. The assumption that follows is that you are building the entire page at once.

When you add caching to your application, the first assumption immediately fails. You are no longer building the page for a single person, you are building the page for potentially every user of your site at once. This requires some re-thinking of how you build your pages, and how you use per-user information to customize your pages.

In order to cope with this change, you have to start generalizing your pages, so the content is not specific to any particular user. This is a topic I touched on at the end of last year's article. We briefly covered a couple of options for solving that problem, mostly involving JavaScript. What you may not have noticed, though, is that in doing so we were directly trying to cope with the realization that our second assumption had failed; we can no longer build the entire page at once.

So, here we sit, the fundamental pillars of our web development process shattered around us. What do we do now? Well, we do what we always do, we start building again. The JavaScript / AJAX options I mentioned for loading data into the page at view time make a lot of sense; but in some sense it feels wrong, like a blank spot in your newspaper that says 'go and look in our weekly magazine to see this article.' It works, but it doesn't feel right.

We've moved our processing to the client, where we know we have very little control over what happens, if anything happens at all. Additionally it barely solves our problems, and doesn't solve them in every case.

So what do we do? For a long time the answer was 'nothing.' We were out of luck. Use JavaScript and cookies to customize anything on the page. Trim our cache times to their absolute minimum in order to allow the dynamism that our sites and our users have become accustomed to. It worked, but again, not entirely, not necessarily every time, and it was complicated.

Things have changed, though. There is a tool which until recently was only available to those willing to pay for hosted caching systems. With the 2.0 release of the Varnish cache software, we finally get easy access to Edge Side Includes, or ESI.

What is ESI?

So what exactly is ESI and what does it do to address our shattered development process? Well, put simply, ESI allows you to include data at the edge of your web application, the cache. You can use ESI to break every one of your pages into smaller pieces and let the cache retrieve them separately and assemble the whole page at request time.

What this means for you is that each 'page' that you create doesn't have to know the details of every piece of content that will show up on the fully rendered page. Instead, it just needs to know what, on a high level, will go there. It leaves actually filling in those details to the cache, which is the point in your system that the browser is actually talking to anyway.

It also means that each piece of content can have its own cache time. Up until now, we've had to be very conservative about our cache times. Our content could only live in the cache for as long as the piece of content with the shortest time to live. If we have a 'recently published stories' block in the sidebar, we have to cache for minutes, even though our story is unlikely EVER to change once it's published. If the 'recently published stories' block can be handled separately, then the story page itself can last much longer.

A game changer

I have seen MVC catch a lot of flak lately in the web development realm. MVC itself originated in the GUI realm and has been somewhat successfully applied elsewhere. I've read blog posts and comments about how MVC doesn't translate to the web and how it is not the right pattern to use. Most often the support for this argument relates to the web's transactional nature or its lack of state. Some people are familiar enough with the origins of MVC to bring up another point. They say one of the points of MVC was to allow the view to retrieve its data as it was needed, which was necessary to really provide an interactive application, an application that responds to us the way we expect real-world objects and machines to respond to us. All of these argments are correct.

That said, for me even though there is something of a mismatch between web development and MVC in its truest sense, there are other benefits of the MVC pattern that make it worthwhile anyway. Separating the problem space into manageable pieces is important. Forcing the conceptualization of the whole problem and breaking it down properly is essential for good software. The code reuse and maintainability that comes from breaking the problem down according to the MVC pattern is more important in my experience than the slight mismatch in the presentation portion of the application.

Even so, there is still a mismatch. When you load a whole screen at once, the dynamism that the original GUI MVC pattern provided is lost. This issue is made worse when you start caching. You're not only not getting an immediately interactive and dynamic view into the data you are interested in, you are in fact getting guaranteed OLD data. That's not what MVC is about.

So what does all this MVC stuff have to do with ESI? Good question. ESI goes beyond solving our dynamic content problem. If you let it, it will change everything about how you develop your web applications. If you pause to think about it, you begin to realize that with ESI every piece of repeated content can be served up separately. Every piece of content can contain its own rules about how it is displayed, and even more interestingly, how it is cached.

Having ESI in our web-development toolbox moves us one step closer to true MVC development on the web. Each view of a piece of content is a self-contained item. It has its own view rules, its own source of data, and its own controller logic, all of which contain only the details they need to do their very specific job. For anyone who has done desktop GUI programming (especially with systems like Cocoa or QT), this will begin to feel familiar.

With ESI you get another bit of the MVC pattern that GUI developers take for granted. You get reuse. We're not talking about the 'factor out this bit of code and call it from lots of places' type of reuse that we as web developers are all familiar with. With ESI we finally get to have small, self contained bits of functionality that exist entirely on their own and can be reused easily anywhere and any time you choose! With ESI we can finally have self-contained widgets that we can plop into a site anywhere we want. I don't have to tell you that this is a big deal.

When the editor asks 'can't we put that most recent articles box on this page,' we can finally comfortably say 'Yes, no problem, give me about 10 minutes.' We can finally say 'Oh, you want a custom box that shows only stories that are of interest to the current user? Sure. On every page? alright.' I know some server admins who would have nightmares just overhearing that statement.

With ESI, we finally have the toolset that lets us do the kind of things we've wanted to do with our web applications forever. And what's more, it's easy, it's flexible and unlike the other solutions to this problem that have been foisted on us before, it actually makes our web applications faster.

What does ESI look like?

Now that I've showed you that ESI is something you want to know, it's time to start actually exploring it. ESI in its entirety is complex. There are many bits of functionality available in the complete ESI specification. Some of them are very useful, some of them are only occasionally useful. We won't explore all of the abilities of ESI in this article, but we will cover the most important ones, which happen to also be the ones that Varnish provides.

There are three bits of ESI that you need to know. They are esi:include, esi:remove, and the <!--esi ... -- > constructs.

The esi:include construct does exactly what you think it does. It allows you to include something into the current page. This is the piece of ESI you will use most often, as it allows you to load another piece of content within the current page.

The esi:remove construct also does what you think it does. Anything between the <esi:remove > tag and the </esi:remove > will be removed from the page. This is useful for providing content you want to include if ESI processing is not enabled. If ESI is not processed, the receiving browser will ignore the <esi:remove > tag, and the contents will appear in the browser.

The final construct is <!--esi ... -- > construct. When the ESI processor encounters this construct, it will remove it, but leave the contents intact. This basically allows you to hide your esi markup from the client browser, in case the ESI is not processed. If you didn't wrap your <esi:include > tags within it, and ESI was not active, the content could show up in the users browser. It also lets you have other content that is not ESI included, but is only shown if the page is being ESI processed.

Since our main concern is the include routine, this will be our focus.

An Example

We've talked about why to use ESI. We've talked briefly about how to use ESI. It's time to take some concrete examples and show how it all works.

For demonstration purposes, we will create a hypothetical site. Let's suppose we have a blog site. The site has multiple authors and people post new entries throughout the day. Under each article, we will have a block that displays links to the last 5 articles added to the site.

In order for this to work, our article view action must contain the logic to load the article being requested. It also must contain the logic to figure out what the most recent five articles are. If you were smart you probably factored out the most recent five article logic, but you still have to forward to it within your view action to fill in the 'recentarticles' stash data. If you had done this, and you are already using the caching method outlined last year your actions might look something like this:

 package BlogSite::Controller::Articles;

 [...]

 sub view : Local Args(1) {
     my ($self, $c, $articleid) = @_;

     $c->stash->{'cachecontrol'}{'articleview'} = 3600;

     $c->stash->{'blog'} = $c->model('BlogSite::Articles')->find($articleid);
     $c->forward('BlogSite::Controller::Articles','recentarticles');
 }

 sub recentarticles : Local {
     my ($self, $c) = @_;

     $c->stash->{'cachecontrol'}{'recentarticles'} = 300;

     my $article_rs = $c->model('BlogSite::Articles')->search(undef, 
                                            { rows => 5,
                                              order_by => 'post_date DESC' });

     $c->stash->{'recentarticles'} = $article_rs->all();
 }

This works. It's not too bad. It's well factored out; the logic for recent articles is self contained so you can call it from wherever. Where it starts to get a little ugly is the template. Under normal circumstances we might have a template that looks like this:

 <div id="maincontent">
   <h2>[% blog.title %]</h2>
   <span class="byline">by [% blog.author.name %]</span>

   <span class="articlebody">
       [% blog.body %]
   </span>
 </div>
 <div id="bottomblock">
   <span class="recentarticles">
        [% FOREACH article IN recentarticles %]
        <a href="/articles/view/[% article.id %]">[% article.title %]</a> by [% article.author.name %]<br/>
        [% END %]
   </span>
 </div>

Still not too bad. Pretty normal actually. But what happens when the editor (or the designer) decides that they want to show the most recent posts on another page, say in a little callout-box on the static 'about us' page? Well now we have a problem. If your static page isn't handled by Catalyst, you can't call the recent article routine at all. Let's assume, though, that your static pages are handled by Catalyst. The action might look something like this (error checking excluded for simplicity):

 package BlogSite::Controller::Page;

 [...]

 sub page : Local {
     my ($self, $c, $pagetoview) = @_;

     $c->stash->{'cachecontrol'}{'staticpage'} = 7200;

     $c->stash->{'template'} = "pages/" . $pagetoview . ".tt2";
 }

Ok... So in order to add the recent articles block, first we have to edit the template and add our recent-articles html to the page. Unfortunately, though, the html won't work if we don't populate the ' recentarticles ' stash variable. Not a big deal... we just have to forward to the recentarticles action. Our page action would now look like this:

 sub page : Local {
      my ($self, $c, $pagetoview) = @_;

      $c->stash->{'cachecontrol'}{'staticpage'} = 7200;

      $c->forward('BlogSite::Controller::Articles','recentarticles');

      $c->stash->{'template'} = "pages/" . $pagetoview . ".tt2";
 }

Have you spotted the problem yet? There are actually two. The first is that now your database is hit for every static page view, whether it displays the recent articles block or not. The second is that now your static pages are forced to cache at the very small 5 minute interval that the recent article block requires.

The particularly experienced among you are thinking of ways to avoid putting the recentarticles call into every page. Calling the routine from within the template, perhaps, or some sort of lazy-loading hackery. Wouldn't it be nice if you didn't have to hack it in?

Using ESI

Let's tackle the same problem now, only this time let's use ESI. We get to use the recentarticles action as-is, because we factored it out well in the first place. With ESI, though, we will be calling recentarticles directly, so we will need to give it its own template. Let's see what it would look like.

 <span class="recentarticles">
      [% FOREACH article IN recentarticles %]
      <a href="/articles/view/[% article.id %]">[% article.title %]</a> by [% article.author.name %]<br/>
      [% END %]
 </span>

Pretty straightforward. Now that we've done this, we can start using it via ESI. Let's start cleaning up. We'll start by revisiting our view template, this time pulling in the recentarticles block via ESI.

  <div id="maincontent">
    <h2>[% blog.title %]</h2>
    <span class="byline">by [% blog.author.name %]</span>

    <span class="articlebody">
        [% blog.body &]
    </span>
  </div>
  <div id="bottomblock">
    <!--esi <esi:include src="/articles/recentarticles" /> -->
  </div>

Next we'll drop the recentarticles forward from our article view routine.

 sub view : Local Args(1) {
     my ($self, $c, $articleid) = @_;

     $c->stash->{'cachecontrol'}{'articleview'} = 3600;

     $c->stash->{'blog'} = $c->model('BlogSite::Articles')->find($articleid);
 }

Things are getting simpler already. Let's pull out the call in our page action also.

 sub page : Local {
      my ($self, $c, $pagetoview) = @_;

      $c->stash->{'cachecontrol'}{'staticpage'} = 7200;

      $c->stash->{'template'} = "pages/" . $pagetoview . ".tt2";
 }

That's a bit simpler, isn't it? For both the the static pages and the article view routine, the page you see looks the same. What is happenning on each request has changed significantly however.

First, we are not calling our recent article logic on every request; we are also not hitting our database on every request. There's something else that's changed as well. Both our article content and our static page content are not limited to 5-minute caching anymore. They are back to caching at 1 and 2 hours respectively.

But what about the recent articles content? That can't be cached for 2 hours. Don't worry. It isn't. The recent article information is only cached for its 5 minute limit. So what's going on? How can the article be cached for an hour but the recent article list be cached for 5 minutes? They are on the same page, it's got to be one or the other.

You are absolutely correct. The full page that is delivered to the browser must have a single limited cache time. This is where Varnish + ESI begins to really shine. Each piece of content is cached according to its own rules, and Varnish assembles the page at request time from the data it has. So the article view page, which may be 30 minutes old, is pulled in from the cache, and the recentarticles block which may be only a minute old, is also pulled in. The content is combined and served up to the waiting browser.

Your site has just experienced a drastic reduction in processing required to serve up each page. The article view page for any given article is requested at most once every hour. The recentarticles block is requested more frequently, but even that has reduced your processing, as only the recent articles logic is run, instead of the combined article retrieval and recent article logic.

Per-user content

This gets more interesting when you have a block of content that is customized for each user. On a site that doesn't use ESI, any page that contains a block that is personalized for the end-user can not be cached at all. This is a major problem. It puts you in the position of choosing personalization or performance.

Again, ESI opens up new options for us. Not only can ESI serve up blocks of content that are cached for different times, it can serve up blocks of content that aren't cached at all. If you have a 'your favorite articles' block, that block can be passed through to the web server, even while the page it's included into is served from the cache.

All you have to do to make this work is what we did above for recentarticles with the minor addition of setting the cache time to zero. Your users get personalized content and you still get to have caching.

Turning it on

There's only one thing left for us to do. We need to turn ESI on in our Varnish cache. This is actually quite easy. If we use the vcl file we were looking at in our last article, we just need to add a bit of code to our vcl_fetch routine. At the very top of our vcl_fetch routine we add the following:

 if (obj.http.Content-Type ~ "html") {
         esi;
 }

This tells Varnish that if the content-type of the data being received from the web server is some form of html, we should process it with ESI. We need to put this test in place because we do not want to run esi processing on binary files. You can expand the content-type match if you want to use ESI in other files, like CSS or JSON responses.

And that's it. Enable the above in your Varnish config and you get ESI instantly.

One note I will add is that once you start using ESI, you need to make sure that you use the s-max-age header instead of max-age. Varnish is smart enough to pay attention to the various headers for all of the ESI included content. However, the browser is not aware of the sub-components, so it will obey the max-age header and cache the entire page. Chances are you don't want that.

Trying it out

One of the more challenging parts of working with ESI is that often you are doing development in a different environment than production. Not many people have a cache in their dev environment, and rightly so. There is nothing quite as frustrating as trying to track down a bug for two hours only to realize you fixed it an hour and 50 minutes ago, but have been looking at a cached copy since then. Without Varnish (or some other ESI processor) in front of your dev site processing the ESI, though, your site will look awfully strange. So what do you do?

Once again, we are developers, so we build something. In this case, to solve this problem I wrote an ESI parser in javascript that runs in the browser. You can place it in your ESI enabled pages within an <esi:remove > block, and your production site will never be the wiser, but without the Varnish cache in front you will still be able to do your development.

The code to do this is here. I am releasing this code under the GPL, so you are welcome to download it and use it for your own development. It requires jQuery to function, which for simplicity is included with the tar file. The tar file contains the ESI parser module and some example files that show its usage.

Summary

Including ESI in your our development toolkit gives us a lot of options that we've never had before. It makes it possible to break your application into pieces that are more manageable and easy to work with. It makes it possible to avoid the speed for dynamism trade-off.

Again, though, don't take my word for it. Grab the JS based ESI parser and try it out. It's an easy way to explore ESI with your own app before committing to a Varnish installation. I'm sure that once you see what it can do, though, you'll be looking at Varnish again.

AUTHOR

jayk - Jay Kuri <jayk@cpan.org>