Dealing with funny IO::Handle type objects in Catalyst Response
Overview
Learn how to use IO::Handle objects in the Catalyst response body, even when the type of IO::Handle may not work how Catalyst expects.
A Funny thing happened on the way to the office
A long tradition in open source communities has been the concept of having a mailing list where users and contributors can share queries and information. As such the Catalyst community is no exception, and while there is most certainly the usual drudge of users grappling with the understanding of core concepts (Which most are happy to patiently answer), there are also a few interesting use cases that come up.
There was one such case recently which raised questions over the IO::Handle type of object handling in Catalyst::Response. In particular there was a warning present in the user's logs as Catalyst attempted to determine the size of the IO::Handle object it was passed in the response body. This was of course followed by Catalyst issuing a warning that it was serving up the content with no pre-determined Content-Length. So what was happening here?
So what does Catalyst do with a IO::Handle ?
Part of the wrap up of this event was a call from John to ammend the documentation to explain more thoroughly what exactly Catalyst does when passed and IO::Handle type of object as the response body. And the documentation patch goes like this, more or less:
'Catalyst accepts an IO::Handle type of object in the response body in the sense that it reasonably can "read" as a method ...'
So what happens actually? Well here's the code in Catalyst
# Content-Length if ( defined $response->body && length $response->body && !$response->content_length ) { # get the length from a filehandle if ( blessed( $response->body ) && $response->body->can('read') || ref( $response->body ) eq 'GLOB' ) { my $size = -s $response->body; if ( $size ) { $response->content_length( $size ); } else { $c->log->warn('Serving filehandle without a content-length'); } } else { # everything should be bytes at this point, but just in case $response->content_length( length( $response->body ) ); } }
Okay, so this is what Catalyst tries to reasonable do under this case. Which is in order:
1. I have something in the response body and I don't have a content_length defined
2. If I can "reasonably" expect this is an actual Filehandle type of thingy then go on
3. Where the above is true, then use a filestat operator to find the size of the Filehandle supplied.
4. Or otherwise just get the bytes coz that should be good enough.
So what was the problem?
Now we look at the particular use case that was presented. In fact the object supplied in $c->res-body() was actually an IO::Compress::Gunzip object. Now by all "reasonable" standards Catalyst determines this to be a IO::Handle type of object by the presence of a "read" method on the object.
The problem here is not a Catalyst problem really. It comes down to that the use of the filetest operator "-s" does not work with the type of object returned from IO::Uncompress::Gunzip. This is not uncommon and tends to be shared with other "handle" objects that present "in-memory" without providing an acutal "on storage" "filehandle".
So how do we deal with this case?
Documetnation to the rescue ?
Aha! So the filetest for size as implemented by Catalyst does not work in this case. Thinking in general IO::Compress::Gunzip does present an interface for "read" as a method, which returns the "uncompressed" bytes from the source. But how do we detertime the "uncompressed" size? And this is what is wanted afterall. Clearly the filetest just doesn't cut it. So where do we go?
Well, let's have a look at "1" above. Which is where we ask 'Do we have a "body" *or* has there been a Content_Length supplied in the headers.
This is the most important part of the Documentation patch, where we say;
'If it is possible to otherwise determine the size of the "handle" then it is advised that the Content_Length be set in the headers of the response where possible'
So essentially if you know the size or can otherwise determine that size, then do it. Thus not leaving it to that Catalyst code to work out the size for you. But now we have the problem. How do we get the uncompressed size of the content?
Eureka! More documentation. Apparently the Gzip spec requires that all content hold header information including the uncompressed size of that content. So on an IO::Uncompress::Gunzip object there is a method available:
my $gz = IO::Uncompress::Gunzip->new( \$comp ); my $header = $gz->getHeaderInfo; print Dumper( $header );
This should show one of the keys in the returned hash to be ISIZE, which contains the uncompressed size of the content. This is what we want for content length so we should be okay with something like this for a controller:
package MyApp::FunnyIO::Web::Controller::TestIO; use Moose; use namespace::autoclean; BEGIN { extends 'Catalyst::Controller'; } use IO::Compress::Gzip qw/gzip $GzipError/; use IO::Uncompress::Gunzip; use Data::Dumper; sub index :Path :GET :Args(0) { my ( $self, $c ) = @_: my $data = "1234567890ABCDEFGHIJKL"; my ( $comp, $body ); gzip( \$data, \$comp ) || die $GzipError; $c->res->content_type('text/plain'); if ( defined ($c->req->header('accept-encoding')) && ($c->req->header('accept_encoding') =~ /gzip/ )) { $c->log->debug( 'Sending Compressed' ); $c->res->content_encoding( 'gzip' ); $body = $comp; } else { $c->log->debug( 'Sending uncompressed' ); $body = IO::Uncompress::Gunzip->new( \$comp ); $c->res->content_length( $body->getHeaderInfo->{ISIZE} ); } $c->res->body( $body ); } sub end : Private {} # No views needed in this controller 1;
Now that shoud do what we want as was asked in the mailing list post. So we have some Gzipped content available in memory as a scalar, if the user agent requesting accepts gzipped content in the content encoding we can just serve that content. Otherwise we get a handle to that content that will uncompress on "read", determine the uncompressed length of the content and set the length in the headers. This satisfies the condition so that Catalyst can skip trying to work out the size itself.
The Trouble with Tribbles
There is one problem with our solution of course. While this will work with relatively small content in the compressed scalar, anything of a reasonable size will fail to return a value for ISIZE in the call to getHeaderInfo().
Why? Well it's all part of IO::Uncopress:Gunzip magic. But in brief the only way we can guarantee that the full headers will be returned is that the compressed handle is read out completely as in:
while (1) { my $len = $body->read( my $buf ); last if !defined $len; } my $size = $body->getHeaderInfo->{ISIZE};
This is not desirable as we are handing off the calls to read to Catalyst in processing the response. It is also not possible in the implementation to rewind an IO::Uncompress::Gunzip object, so to get the begining of the handle again we would have to re-initialize or work with a clone.
So implementing some of the magic to get the uncompressed length ourselves is required lets's change some of the code in our controller:
# Unpack the last 4 bytes of the compressed data my $io = IO::Scalar->new( \$comp ); $io->seek( -4, 2 ); $io->read( my $buf, 4); $io->close(); my $size = unpack( 'V', $buf ); # Explicitly setting the size $c->res->content_length( $size );
This comes from a bit of reading that says the ISIZE value is contained in the trailing 4 bytes of the gzipped content. So with the help of IO::Scalar we take our original scalar containg the gzipped content, seek to last 4 bytes from the end of the content (now a IO::Handle) and read them out. The call to unpack returns an integer that has the uncompressed size.
There is much discussion on how to determine the uncompressed size, but in short we are dealing with a relatively small ( in memory ) amount of content so we are not going to bother with dealing with content over 4GB. This should do.
Putting it all together, Cleanly
While this works for an example, what we really want is to make this work in a real application. Also while I have stated that in cases like this it is best to set the content length in the response yourself, it does create a lot of unnecessary noise in the controller which should just be getting the content and serving it. So now it's time to clean thing up more suitable to a solution. Consider the following lines in the Catalyst code:
my $size = -s $response->body; if ( $size ) { $response->content_length( $size ); }
While our previous efforts have been aimed at trying not to get here, it is a fair solution to just let it happen. Our current problem is an object produced from IO::Uncompress::Gunzip will not correctly respond to the filetest operator used to determine the size. So what we need to do is make it work that way.
How? We need a subclass of IO::Compress::Gunzip where can overload the response to the filetest -s to do what we want.
package MyApp::FunnyIO::Domain::FunnyIO; use Moose; use MooseX::NonMoose::InsideOut; extends 'IO::Uncompress::Gunzip'; use IO::Scalar; use namespace::sweep; use overload 'bool' => sub { 1 }, '-X' => \%myFileTest; has '_content' => ( is => 'ro' );
We are using MooseX::NonMoose::InsideOut in order to wrap a tricky non Moose class. We want some Moose niceties while still holding on to the behavior of IO::Uncompress::Gunzip which will act at all points like a handle.
Use of overload uses the '-X' hook. This is a special hook into all of the filetest operators. So when a filetest is performed on our created object we will delegate to our method to determine the size.
As a final note there is the usage of namespace::sweep which is a special version of namespace::autoclean. The main reason for using this instead of namespace::autoclean is that the later does not play well with overload and will clean it up as an imported method. Using namespace::sweep we avoid the problem an keep the overload while cleaning up anything else that was imported. Now for the meat of the code.
sub myFileTest { my ( $self, $arg ) = @_; if ( $arg eq "s" ) { my $io = IO::Scalar->new( $self->_content ); $io->seek( -4, 2 ); $io->read( my $buf, 4 ); return unpack( 'V', $buf ); } else { die "Only implementing a size operator at this time"; } }
So as we can see here our method that will be called takes an argument $arg. This will be passed in via the '-X' overload as the actual "letter" used in the filetest operator. In this case we only look for an invocation of "-s" as that will indicate the test for size. The inner work does what we did before. And now for a sensible contructor:
around BUILDARGS => sub { my ( $orig, $class, $ref ) = @_; return $class->$orig({ '_content' => $ref }); }; sub FOREIGNBUILDARGS { my ( $class, $args ) = @_: return $args; } no Moose; __PACKAGE__->meta->make_immutable; 1;
This allows us to set up the constructor to behave more like the underlying IO::Uncompress::Gunzip object with a single scalar ref as it's input argument. The Moose magic uses the special $class and $orig to initialize the object with a more Mooselike constructor, and we keep a copy of the original scalar refernce we are passing in so we can get the size information. Also there is the hook for FORIEGNBUILDARGS which is called by MooseX::NonMoose::InsideOut. This is simply used to pass in the correct arguments to the constructor of the class we are extending
my $z = MyApp::FunnyIO::Domain::FunnyIO->new( \$comp ); my $size = -s $z; print $z->getline();
That code would should that we can use our class in place of IO::Uncompress::Gunzip and get a result we want for the $size. Also our exposed IO::Handle type methods work as expected and return the uncompressed content.
We probably want another component to get the content, so we'll do something quick like this:
package MyApp::FunnyIO::Domain::GzipData; use Moose; use MooseX::Singleton; use IO::Compress::Gzip qw/gzip $GzipError/; use Path::Class::File; use namespace::sweep; has content => ( is => 'ro', isa => 'Str', lazy_build => 1 ); has gzipped => ( is => 'ro', reader => 'getData', lazy_build => 1 ); sub _build_content { return Path::Class::File->new( 'data', 'product.json' )->slurp; } sub _build_gzipped { my $self = shift; my $comp; gzip( \$self->content, \$comp ) || die $GzipError; return $comp; } no Moose; __PACKAGE__->meta->make_immutable; 1;
This is a trivial source that will take either some plain scalar content or already gzipped content and allow us to get the data back. At the very least, for brevity a call to getData on an object with no arguments provided in the constructor will read in a file and compress it's contents before returning it.
These components are as yet just plain classes outside of the Catalyst context. We could just use the classes in our controller code but it is going to be cleaner and more extensible to implement these as Catalyst models. So two trivial model classes:
package MyApp::FunnyIO::Web::Model::FunnyIO; use base 'Catalyst::Model::Factory'; __PACKAGE__->config( class => 'MyApp::FunnyIO::Domain::FunnyIO' ); sub mangle_arguments { my ( $self, $args ) = @_; return $args->{data}; } 1; package MyApp::FunnyIO::Web::Model::GzipData; use base 'Catalyst::Model::Adaptor'; __PACKAGE__->config( class => 'MyApp::FunnyIO::Domain::GzipData' ); 1;
Okay. So now we can just call the model components rather than hard code and import classes into our controller logic. The classes can also be overridden in config which is useful as well.
In any case we can now implement our controller code like this:
package MyApp::FunnyIO::Web::Controller::FunnyIO; use Moose; use namespace::autoclean; BEGIN { extends 'Catalyst::Controller'; } sub index :Path :GET :Args(0) { my ( $self, $c ) = @_; my ( $body, $comp ); $comp = $c->model('GzipData')->getData(); $c->res->content_type('text/plain'); if ( defined($c->req->header('accept-encoding')) && ($c->req->header('accept_encoding') =~ /gzip/ )) { $c->res->content_encoding('gzip'); $body = $comp; } else { $body = $c->model('FunnyIO', data => \$comp); } $c->res->body( $body ); } sub end : Private {} # No views in this controller; __PACKAGE__->meta->make_immutable; 1;
This gives us much cleaner and clearer approach by moving any working logic out of the controller and allowing the controller to fetch and negotiate how content is served. So basically, get some compressed data from the model, if the agent can accept gzip encoding then set the body as is, otherwise set the body to the uncompressor handle obtained from model.
The resulting call where Catalyst will set the content_length in the headers will correctly call our overloaded method returning the correct uncompressed size. Trying it out with curl:
$ curl -I --header 'accept-encoding: gzip' http://localhost:5000/funnyio HTTP/1.1 200 OK Content-Encoding: gzip Content-Length: 766 Content-Type: text/plain X-Catalyst: 5.90051 Date: Tue, 03 Dec 2013 10:50:02 GMT Connection: keep-alive $ curl -I http://localhost:5000/funnyio HTTP/1.1 200 OK Content-Length: 5877 Content-Type: text/plain X-Catalyst: 5.90051 Date: Tue, 03 Dec 2013 10:50:56 GMT Connection: keep-alive
Just as another exercise, we can see the original warnings produced by overriding the default class of the model so that we are using IO::Uncompress::Gunzip instead. We can do this as we kept the same basic interface in our class, making them interchangeable. So in our application config file:
--- Model::FunnyIO: class: IO::Uncompress::Gunzip
This should show the warning from the filetest operator and the Catalyst produced warning that content_length has not been set
-s on unopened filehandle
A Full Example
See the example application:
https://github.com/snakierten96/2013-Advent-Staging/tree/master/MyApp-FunnyIO-Web
Summary
So by now we should have covered a little bit more on how Catalyst handles IO::Handle type objects in the request body and how we can alter the behaviour of any such class to play nicely with what Catalyst expects. I also hope this shows an approach of seperating any domain logic from the controller and use the model system instead to call in your implementation classes.
Author
Neil Lunn email:neil@mylunn.id.au