UTF8 Encoding in Body Response
All about stuff that is changing in the Holland (current development) release around content encoding and unicode support (part three, encoding the body response.
Summary
use utf8; use warnings; use strict; package MyApp::Controller::Root; use base 'Catalyst::Controller'; use File::Spec; sub scalar_body :Local { my ($self, $c) = @_; $c->response->content_type('text/html'); $c->response->body("<p>This is scalar_body action ♥</p>"); } sub stream_write :Local { my ($self, $c) = @_; $c->response->content_type('text/html'); $c->response->write("<p>This is stream_write action ♥</p>"); } sub stream_write_fh :Local { my ($self, $c) = @_; $c->response->content_type('text/html'); my $writer = $c->res->write_fh; $writer->write_encoded('<p>This is stream_write_fh action ♥</p>'); $writer->close; } sub stream_body_fh :Local { my ($self, $c) = @_; my $path = File::Spec->catfile('t', 'utf8.txt'); open(my $fh, '<', $path) || die "trouble: $!"; $c->response->content_type('text/html'); $c->response->body($fh); }
Discussion
Beginning with the current development release (Holland, dev003 currently on CPAN as of this writing) Catalyst enables UTF8 body response encoding by default. You no longer need to set the encoding configuration (although doing so won't hurt anything).
Currently we only encode if the content type is one of the types which generally expects a UTF8 encoding. This is determined by the following regular expression:
$c->response->content_type =~ /^text|xml$|javascript$/
You should set your content type prior to header finalization if you want Catalyst to encode.
Encoding with Scalar Body
Catalyst supports several methods of supplying your response with body content. The first and currently most common is to set the Catalyst::Response ->body with a scalar string ( as in the example):
sub scalar_body :Local { my ($self, $c) = @_; $c->response->content_type('text/html'); $c->response->body("<p>This is scalar_body action ♥</p>"); }
In general you should need to do nothing else since Catalyst will automatically encode this string during body finalization. The only matter to watch out for is to make sure the string has not already been encoded, as this will result in double encoding errors.
Encoding with streaming type responses
Catalyst offers two approaches to streaming your body response. Again, you must remember to set your content type prior to streaming, since invoking a streaming response will automatically finalize and send your HTTP headers (and your content type MUST be one that matches the regular expression given above.)
The first streaming method is to use the write
method on the response object. This method
allows 'inlined' streaming and is generally used with blocking style servers.
sub stream_write :Local { my ($self, $c) = @_; $c->response->content_type('text/html'); $c->response->write("<p>This is stream_write action ♥</p>"); }
You may call the write
method as often as you need to finish streaming all your content.
Catalyst will encode each line in turn.
The second way to stream a response is to get the response writer object and invoke methods on that directly:
sub stream_write_fh :Local { my ($self, $c) = @_; $c->response->content_type('text/html'); my $writer = $c->res->write_fh; $writer->write_encoded('<p>This is stream_write_fh action ♥</p>'); $writer->close; }
This can be used just like the write
method, but typicallty you request this object when
you want to do a nonblocking style response since the writer object can be closed over or
sent to a model that will invoke it in a non blocking manner. For more on using the writer
object for non blocking responses you should review the Catalyst
documentation and also
you can look at several articles from last years advent, in particular:
http://www.catalystframework.org/calendar/2013/10, http://www.catalystframework.org/calendar/2013/11, http://www.catalystframework.org/calendar/2013/12, http://www.catalystframework.org/calendar/2013/13, http://www.catalystframework.org/calendar/2013/14.
The main difference this year is that previously calling ->write_fh would return the actual
plack writer object that was supplied by your plack application handler, whereas now we wrap
that object in a lightweight decorator object that proxies the write
and close
methods
and supplies an additional write_encoded
method. write_encoded
does the exact same thing
as write
except that it will first encode the string when necessary. In general if you are
streaming encodable content such as HTML this is the method to use. If you are streaming
binary content, you should just use the write
method (although if the content type is set
correctly we would skip encoding anyway, but you may as well avoid the extra noop overhead).
The last style of content response that Catalyst supports is setting the body to a filehandle like object. In this case the object is passed down to the Plack application handler directly and currently we do nothing to set encoding.
sub stream_body_fh :Local { my ($self, $c) = @_; my $path = File::Spec->catfile('t', 'utf8.txt'); open(my $fh, '<', $path) || die "trouble: $!"; $c->response->content_type('text/html'); $c->response->body($fh); }
In this example we create a filehandle to a text file that contains UTF8 encoded characters. We pass this down without modification, which I think is correct since we don't want to double encode. However this may change in a future development release so please be sure to double check the current docs and changelog. Its possible a future release will require you to to set a encoding on the IO layer level so that we can be sure to properly encode at body finalization. So this is still an edge case we are writing test examples for.
Disabling default UTF8 encoding
You may encounter issues with your legacy code running under default UTF8 body encoding. If so you can disable this with the following configurations setting:
MyApp->config(encoding=>undef);
Where MyApp
is your Catalyst subclass.
If you believe you have discovered a bug in UTF8 body encoding, I strongly encourage you to report it (and not try to hack a workaround in your local code).
Conclusion
Getting UTF8 characters from form POSTs and in your URL query should mostly 'do the right thing'. Of course there's a bit of an art to this and we expect that over time we'll need to build up a cookbook of practices and workarounds to help even more.
In the final article we we look at how Catalyst does response body encoding, including streaming, delayed and filehandle responses.
Author
John Napiorkowski jjnapiork@cpan.org