Delivering Static Media with Catalyst

Why?

Why would you want to deliver static media with Catalyst at all?

Generally you use the Static::Simple plugin to serve assets like css and image files. These are almost always located in the /root/static directory. However for production use, it is recommended that your web server serve static content directly. This is much more efficient.

Using an entire application server process (i.e. a perl process) to serve static media is a massive waste of resources, especially if you're not using a front-end proxy. Your web server ends up with this heavyweight process tied up sending bytes to a user rather than doing useful work.

However, there are a number of cases where unconditionally serving the static content to the user is a bad idea - think of protected .pdf documents, time limited URIs, single use (or n-use) download URIs, etc.

What about a cache?

That's a good question - putting a cache (such as varnish) in front of your application and letting it handle most of the requests for static content is a great solution. However, in some cases (specifically for single use URIs, when you want to check ACLs and/or log all accesses to a resource), it is not possible to perform any sort of caching in front of your application, because you won't be able to perform the checks you need to at that layer.

How?

As a first implementation, using the serve_static_file method is a good way to start. If your app is small and for internal company use only, then that's probably all that you need to do. However, if you're serving large files and likely to have a significant number of users, then sending bytes in Perl just won't scale.

Scaling

The key to scaling is doing things with the minimum resources possible. Sending bytes across the Internet is something that web servers are very good at doing efficiently. For access-controlled static files, scalable request handling will be something like:

  • Request comes into your web server.
  • Request is sent to your Catalyst application running as FastCGI.
  • Application validates the request, processes internal logic (e.g. logging and counters) then issues a response with a X-Accel-Redirect or X-Sendfile header set.
  • Your web server takes care of serving the content back to the user as specified by that header.

The key is this X-Accel-Redirect or X-Sendfile thing - these are just headers that your application sends as the response. Which one you use depends which web server you're running.

After sending this response (headers only, no content), your application server is freed up and able to get on with handling the next request.

A note on web servers

This topic appears to be something that people get very excited about, but for the purposes of this discussion, your favourite web server is likely to work fine. I'll go through the common choices below.

Apache

Apache is generally thought of as big, but this is very much dependent on the modules that you have loaded, and what MPM (Multi-Processing Module) you're using. A lightweight Apache is perfectly acceptable to serve content. mod_xsendfile provides the X-Sendfile support.

Lighttpd

Lighttpd has simpler configuration compared to Apache if you're setting it up from scratch. On the other hand, your system administrators are also less likely know their way around it. Lighttpd has X-SendFile support as standard.

nginx

This server works differently (and uses a different header) from either of the above. X-Accel-Redirect actually doesn't logically send a file from disk, it internally redirects the request to another path known by the web server, bypassing ACL checking.

This means that the correct header to send depends on your web server configuration, which is a disadvantage. But nginx can act as a proxy as well as a web server. This is a powerful advantage, as nginx can proxy for a backend content server.

perlbal

This is a Perl-based load balancer which supports an X-REPROXY-URL header to allow you to reproxy content to a backend content server.

X-REPROXY-URL is not covered in this article as I don't have any experience with it myself, and you still need a web server to serve your main application.

Example

Here is a working example using nginx, to reproxy a MogileFS installation, as that's what I'm using in production to deliver 150Tb of content at high speed :) In this case I'm using the MogileFS module for nginx (see below) to serve the actual content.

It should be noted that MogileFS isn't trivial to install or maintain, but if you're prepared to expend the time and expertise then it is an extremely cost-effective solution to file storage on a large scale.

However, it is possible to have a much simpler infrastructure to serve media from a directory on a local disk if the amount of content you're delivering isn't huge.

nginx config

    http {
        include     /etc/nginx/mime.types;
        access_log  /var/log/nginx/access.log;

        sendfile           on;
        keepalive_timeout  65;
        tcp_nodelay        on;

        gzip on;

        server {
            listen 80;
            root /opt/FileServer/webroot;

            location / {
                include /etc/nginx/fastcgi_params;
                fastcgi_pass unix:/tmp/fileserver.socket;
                access_log   off;
            }

            location /private/mogile/ {
                internal;
                mogilefs_tracker 1.1.1.1:7001;
                mogilefs_domain mydomain;
                mogilefs_pass {
                   proxy_pass $mogilefs_path;
                   proxy_hide_header Content-Type;
                   proxy_buffering off;
                }
                access_log   off;
            }
        }
    }

Example code snippet for generating timed URIs.

This method assumes that it's being called on a file object which stores various pieces of metadata about a file.

    method generate_timed_uri (%p) {
        $p{ttl} ||= 60 * 60 * 24 * 7; # Default to a week

        # Turn integers into strings to save space in the URIs
        # (and make them less obvious)

        # Get the primary key of the row
        my $id = Math::BaseCalc->new(digits => ['a' .. 'z'])->
              to_base( $self->id );

        # Make a random secret string associated with the file object.
        # This string not related to the contents of the file
        my $secret = $self->secret;

        # Generate a time rounded off to last expiry point
        # then 2x forward of the TTL.
        my $time = Math::BaseCalc->new(digits => ['a' .. 'z'])->
                               to_base( (time() % $ttl) + $ttl * 2 );

        # Note this is theoretically cryptographically weak,
        # we should really use HMAC.
        my $hash = Digest::SHA1::sha1_hex($self->secret . '-' . $time);

        return '/file/' . join('/', $hash, $time, $id, $self->filename . '.' . $self->extension);
    }

Example code snippet for serving timed URIs.

    sub file : Local {
        # Note that we ignore the filename in the URI generated above.
        # It is just there to make the uri look 'normal'.
        my ($self, $c, $hash, $time, $id) = @_;

        $id = Math::BaseCalc->new(digits => ['a' .. 'z'])->from_base($id);

        my $file = eval { $schema->resultset('File')->find($id) }
            or return $self->error_notfound($c);

        return $self->error_notfound($c)
            unless $hash eq Digest::SHA1::sha1_hex($file->secret . '-' . $time)

        # The time embedded in the URL is the latest time this image is valid.
        $expire_time = Math::BaseCalc->new(digits => ['a' .. 'z'])->from_base($time);

        if ($expire_time < time()) {
            return $self->error_expired($c, $file);
        }
        # NOTE: In this case as you know when the URI expires, it is possible
        #       to serve cache headers and have the hot items cached by your
        #       front end proxy. This is meant to be a simple example and so
        #       this is left as an exercise for the reader :)

        $c->res->header('Content-Type', $file->mimetype);
        $c->res->header('Accept-Ranges', 'none');

        # Note: this matches the path in the nginx config above.
        #       With X-Sendfile, it would be the filename on disk instead.
        my $redirect_to = '/private/mogile/' . $file->mogile_id;
        $c->res->header('X-Accel-Redirect' => $redirect_to);
    }

SEE ALSO

AUTHOR

  Tomas Doran (t0m) <bobtfish@bobtfish.net>