Delivering Static Media with Catalyst
Why?
Why would you want to deliver static media with Catalyst at all?
Generally you use the Static::Simple
plugin to serve assets like css and image files. These are almost
always located in the /root/static
directory. However for production
use, it is recommended that your web server serve static content
directly. This is much more efficient.
Using an entire application server process (i.e. a perl process) to serve static media is a massive waste of resources, especially if you're not using a front-end proxy. Your web server ends up with this heavyweight process tied up sending bytes to a user rather than doing useful work.
However, there are a number of cases where unconditionally serving the static content to the user is a bad idea - think of protected .pdf documents, time limited URIs, single use (or n-use) download URIs, etc.
What about a cache?
That's a good question - putting a cache (such as varnish) in front of your application and letting it handle most of the requests for static content is a great solution. However, in some cases (specifically for single use URIs, when you want to check ACLs and/or log all accesses to a resource), it is not possible to perform any sort of caching in front of your application, because you won't be able to perform the checks you need to at that layer.
How?
As a first implementation, using the serve_static_file method is a good way to start. If your app is small and for internal company use only, then that's probably all that you need to do. However, if you're serving large files and likely to have a significant number of users, then sending bytes in Perl just won't scale.
Scaling
The key to scaling is doing things with the minimum resources possible. Sending bytes across the Internet is something that web servers are very good at doing efficiently. For access-controlled static files, scalable request handling will be something like:
- Request comes into your web server.
- Request is sent to your Catalyst application running as FastCGI.
- Application validates the request, processes internal logic (e.g. logging and
counters) then issues a response with a
X-Accel-Redirect
orX-Sendfile
header set. - Your web server takes care of serving the content back to the user as specified by that header.
The key is this X-Accel-Redirect
or X-Sendfile
thing - these are just
headers that your application sends as the response. Which one you
use depends which web server you're running.
After sending this response (headers only, no content), your application server is freed up and able to get on with handling the next request.
A note on web servers
This topic appears to be something that people get very excited about, but for the purposes of this discussion, your favourite web server is likely to work fine. I'll go through the common choices below.
Apache
Apache is generally thought of as big, but this is very much
dependent on the modules that you have loaded, and what MPM
(Multi-Processing Module) you're using. A lightweight Apache is
perfectly acceptable to serve content. mod_xsendfile
provides the
X-Sendfile
support.
Lighttpd
Lighttpd has simpler configuration compared to Apache if you're setting it up from scratch. On the other hand, your system administrators are also less likely know their way around it. Lighttpd has X-SendFile support as standard.
nginx
This server works differently (and uses a different header) from either of the above. X-Accel-Redirect actually doesn't logically send a file from disk, it internally redirects the request to another path known by the web server, bypassing ACL checking.
This means that the correct header to send depends on your web server configuration, which is a disadvantage. But nginx can act as a proxy as well as a web server. This is a powerful advantage, as nginx can proxy for a backend content server.
perlbal
This is a Perl-based load balancer which supports an X-REPROXY-URL
header to allow you to reproxy content to a backend content server.
X-REPROXY-URL
is not covered in this article as I don't have any experience
with it myself, and you still need a web server to serve your main application.
Example
Here is a working example using nginx
, to reproxy a MogileFS
installation,
as that's what I'm using in production to deliver 150Tb of content at high
speed :) In this case I'm using the MogileFS module for nginx (see below) to serve the actual content.
It should be noted that MogileFS
isn't trivial to install or maintain, but
if you're prepared to expend the time and expertise then it is an extremely
cost-effective solution to file storage on a large scale.
However, it is possible to have a much simpler infrastructure to serve media from a directory on a local disk if the amount of content you're delivering isn't huge.
nginx config
http { include /etc/nginx/mime.types; access_log /var/log/nginx/access.log; sendfile on; keepalive_timeout 65; tcp_nodelay on; gzip on; server { listen 80; root /opt/FileServer/webroot; location / { include /etc/nginx/fastcgi_params; fastcgi_pass unix:/tmp/fileserver.socket; access_log off; } location /private/mogile/ { internal; mogilefs_tracker 1.1.1.1:7001; mogilefs_domain mydomain; mogilefs_pass { proxy_pass $mogilefs_path; proxy_hide_header Content-Type; proxy_buffering off; } access_log off; } } }
Example code snippet for generating timed URIs.
This method assumes that it's being called on a file object which stores various pieces of metadata about a file.
method generate_timed_uri (%p) { $p{ttl} ||= 60 * 60 * 24 * 7; # Default to a week # Turn integers into strings to save space in the URIs # (and make them less obvious) # Get the primary key of the row my $id = Math::BaseCalc->new(digits => ['a' .. 'z'])-> to_base( $self->id ); # Make a random secret string associated with the file object. # This string not related to the contents of the file my $secret = $self->secret; # Generate a time rounded off to last expiry point # then 2x forward of the TTL. my $time = Math::BaseCalc->new(digits => ['a' .. 'z'])-> to_base( (time() % $ttl) + $ttl * 2 ); # Note this is theoretically cryptographically weak, # we should really use HMAC. my $hash = Digest::SHA1::sha1_hex($self->secret . '-' . $time); return '/file/' . join('/', $hash, $time, $id, $self->filename . '.' . $self->extension); }
Example code snippet for serving timed URIs.
sub file : Local { # Note that we ignore the filename in the URI generated above. # It is just there to make the uri look 'normal'. my ($self, $c, $hash, $time, $id) = @_; $id = Math::BaseCalc->new(digits => ['a' .. 'z'])->from_base($id); my $file = eval { $schema->resultset('File')->find($id) } or return $self->error_notfound($c); return $self->error_notfound($c) unless $hash eq Digest::SHA1::sha1_hex($file->secret . '-' . $time) # The time embedded in the URL is the latest time this image is valid. $expire_time = Math::BaseCalc->new(digits => ['a' .. 'z'])->from_base($time); if ($expire_time < time()) { return $self->error_expired($c, $file); } # NOTE: In this case as you know when the URI expires, it is possible # to serve cache headers and have the hot items cached by your # front end proxy. This is meant to be a simple example and so # this is left as an exercise for the reader :) $c->res->header('Content-Type', $file->mimetype); $c->res->header('Accept-Ranges', 'none'); # Note: this matches the path in the nginx config above. # With X-Sendfile, it would be the filename on disk instead. my $redirect_to = '/private/mogile/' . $file->mogile_id; $c->res->header('X-Accel-Redirect' => $redirect_to); }
SEE ALSO
http://blog.lighttpd.net/articles/2006/07/02/x-sendfile
http://wiki.nginx.org/NginxXSendfile
http://tn123.ath.cx/mod_xsendfile/
AUTHOR
Tomas Doran (t0m) <bobtfish@bobtfish.net>