How To Hack On Catalyst Core - Adding $response->from_psgi_response

Overview

A lot of people want to contribute to the Catalyst project, but get scared off due to an assumption that the code is highly complex and requires an experts touch. This article will review how we added the feature that allows a Catalyst::Response to be filled from the response of a PSGI appliction.

See from_psgi_response in Catalyst::Response for more.

Introduction

Catalyst has been PSGI native for a while, but to date has not really taken much advantage of that fact beyond the ability to use Plack based web servers in replacement of the Catalyst::Engine namespace. This is not a small think to be sure, but it would be great if Catalyst could not flex its PSGI abilities even more. One thing that would be possible is for Catalyst to allow one to populate a reponse via a Plack::App such as Plack::App::File (which would make a possible replacement for Catalyst::Plugin::Static::Simple). Or you could use Catalyst to mount other frameworks, such as Web::Machine which excel in a particular domain such as building RESTful APIs.

So, how can one do this? Let's look into it!

Prequisite Tasks

Two features added to Catalyst in previous releases where necessary to perform this task properly. The first was to allow one to have more control over how we write output to the server and out to the requesting client. This was the feature to expose the writer filehandle, and it was primarily added to make it easier to write asynchronous and/or streaming output, although as we will see, that feature enabled us to do stuff we'd not thought of at that point.

See res-write_fh in Catalyst::Response for more on using $Response->write_fh.

This was needed as we will see later to make sure we could properly deal with a streaming or delayed PSGI response. Without it we'd be forced to fully buffer the response, which would be a shame since Catalyst can support these types of responses.

For more information see Delayed-Response-and-Streaming-Body in PSGI.

The second feature was added as a development version of this summer past, and it supports adding an input buffer object should one not exist in the case where the underlying Plack engine does not provide one. This is needed because Catalyst slurps up any POST or PUT content that might be part of the request and that means that any PSGI application that gets called later on won't get that (in other words it won't have access to POSTed form parameters or file uploads). Some PSGI webservers like Starman provide a readable, buffered filehandle of this request content, others (like ModPerl and FastCGI) do not. This leads to issues where some code works fine in development (when you are using Starman) but then break in a FastCGI (or *shudder* mod_perl) setup. So now in newer versions of Catalyst when it is slurping up the request content, if we notice there is no input buffer, we create one so that anyone that needs it down the line will get access to it.

Between these two features we have everything in Catalyst we need to complete this task.

The Test Cases

Prior to working on this task, I wrote out the minimal test cases in the form of a new controller test. Here's a version of those test cases:

    package TestFromPSGI::Controller::Root;

    use base 'Catalyst::Controller';

    sub from_psgi_array : Local {
      my ($self, $c) = @_;
      my $res = sub {
        my ($env) = @_;
        return [200, ['Content-Type'=>'text/plain'],
          [qw/hello world today/]];
      }->($c->req->env);

      $c->res->from_psgi_response($res);
    }

    sub from_psgi_code : Local {
      my ($self, $c) = @_;

      my $res = sub {
        my ($env) = @_;
        return sub {
          my $responder = shift;
          return $responder->([200, ['Content-Type'=>'text/plain'],
            [qw/hello world today2/]]);
        };
      }->($c->req->env);

      $c->res->from_psgi_response($res);
    }

    sub from_psgi_code_itr : Local {
      my ($self, $c) = @_;
      my $res = sub {
        my ($env) = @_;
        return sub {
          my $responder = shift;
          my $writer = $responder->([200, ['Content-Type'=>'text/plain']]);
          $writer->write('hello');
          $writer->write('world');
          $writer->write('today3');
          $writer->close;
        };
      }->($c->req->env);

      $c->res->from_psgi_response($res);
    }

So if you go and read the PSGI documentation, you know there's three types of responses we need to deal with. The first is the classic PSGI example of a tuple (ArrayRef of Status, Headers and BodyArray/Body Filehandle). The second and third are variations of delayed response. These types of responses return a coderef instead of a tuple and can including streaming types of responses (see the third action from_psgi_code_itr for example of this).

Ok, so we created what are now failing test cases. Lets write the method to make them pass!

The Code

Let's take a look at the full method added to Catalyst::Response and then well do a walkthrough:

    sub from_psgi_response {
        my ($self, $psgi_res) = @_;
        if(ref $psgi_res eq 'ARRAY') {
            my ($status, $headers, $body) = @$psgi_res;
            $self->status($status);
            $self->headers(HTTP::Headers->new(@$headers));
            if(ref $body eq 'ARRAY') {
              $self->body(join '', grep defined, @$body);
            } else {
              $self->body($body);
            }
        } elsif(ref $psgi_res eq 'CODE') {
            $psgi_res->(sub {
                my $response = shift;
                my ($status, $headers, $maybe_body) = @$response;
                $self->status($status);
                $self->headers(HTTP::Headers->new(@$headers));
                if($maybe_body) {
                    if(ref $maybe_body eq 'ARRAY') {
                      $self->body(join '', grep defined, @$maybe_body);
                    } else {
                      $self->body($maybe_body);
                    }
                } else {
                    return $self->write_fh;
                }
            });  
         } else {
            die "You can't set a Catalyst response from that, expect a valid PSGI response";
        }
    }

The example usage is as follows:

    package MyApp::Web::Controller::Test;

    use base 'Catalyst::Controller';
    use Plack::App::Directory;

    my $app = Plack::App::Directory
      ->new({ root => "/path/to/htdocs" })
        ->to_app;

    sub myaction :Local {
      my ($self, $c) = @_;
      $c->response->from_psgi_response(
        $app->($c->request->env));
    }

So basically given a PSGI response, as you would get from running it against the current $env (part of the PSGI specification), let that response be the response that Catalyst returns. We could use a bit of sugar here to make it easier to mount the PSGI application under the current action namespace (for example) but this really is the minimal viable useful feature upon which all that coould later be built, should we find it useful to do so.

Alll things being equal, I'd prefer to see this method refactored a bit, rather than one big method with so many conditionals. However I was concerned about adding to the Catalyst::Response namespace, particularly since the request object is something Catalyst explicitly makes public and replacable. We have no idea whats going on in the darkpan, so its best to err on the side of making the smallest footprint in the code that we can.

Code Walkthrough

Starting from the top:

    sub from_psgi_response {
        my ($self, $psgi_res) = @_;

Declare the new method and slurp up the expected arguments. In this case we expect just a single argument which is the PSGI compliant response. This is sure to be a reference of some type, but we'll need to inspect it a bit to figure out the correct handling:

        if(ref $psgi_res eq 'ARRAY') {
            my ($status, $headers, $body) = @$psgi_res;
            $self->status($status);
            $self->headers(HTTP::Headers->new(@$headers));

So if you recall from the test case, we know that the PSGI response is going to be an ArrayRef or a CodeRef. Let's handle the easy case first, since if the response is an ArrayRef that means its all complete and we can just map it to the Catalyst::Response directly. As you can see mapping the status and headers is straightforward. Let's see the code to map the body:

            if(ref $body eq 'ARRAY') {
              $self->body(join '', grep defined, @$body);
            } else {
              $self->body($body);
            }

This is a bit more tricky since the body can be a filehandle like object or an ArrayRef. Catalyst allows for string bodies or filehandles, which means in the case of it being an ArrayRef we need to flatten it to a string.

I seriously considered changing Catalyst to allow for a ArrayRef body, but in the end I felt it was too much risk for what seemed like small reward at this time. Perhaps in a future release?

So, what about the second case, when the PSGI response is a coderef?

        } elsif(ref $psgi_res eq 'CODE') {

The specification says when the response is a coderef (this is considered a delayed response) the server should obtain the actual response by calling that coderef with coderef of its own. In this case since Catalyst is hosting the PSGI application, we can consider it the server for now. Lets build the coderef we want to pass to the delayed response (if you look at PSGI and related Plack examples, this second coderef is often call the responder, and conventionally called $responder or $respond in code examples).

          $psgi_res->(sub {
                my $response = shift;
                my ($status, $headers, $maybe_body) = @$response;
                $self->status($status);
                $self->headers(HTTP::Headers->new(@$headers));

The PSGI application that is returning a delayed response has two options but the both start the same way. It must call the $responder coderef with at least the first two parts of the classic PSGI tuple, the status and headers. It then may or may not return the body arrayref or filehandle at this point. If you refer back to the test case we wrote for Catalyst you can see what such an application would look like:

    my $psgi_app = sub {
      my $responder = shift;
      return $responder->([200, ['Content-Type'=>'text/plain'],
        [qw/hello world today2/]]);
    };

In this example the PSGI application is returning the full tuple. This is the case we handle first:

                if($maybe_body) {
                    if(ref $maybe_body eq 'ARRAY') {
                      $self->body(join '', grep defined, @$maybe_body);
                    } else {
                      $self->body($maybe_body);
                    }
                }

Here we find that the body is ready to go, so we do the "is it an arrayref or a filehandle" dance again and then we are done.

However there's yet one final option. The PSGI application may choose to not return the body immediately. You would do this in cases where the body response is streamed, or if you are running the application in an event loop and the body is being created in response to events (such as the result of a database query). In this case, the $responder coderef is expected to return a PSGI writer object, which does at least two methods, write and close. This way your application can continue to stream output via the write method and when its done you call close.

Its actually a bit more complicated, since ideally you'd also monitor for the case where you lose the connection between the client and server, but for ease of illustration, lets assume the streaming response is just like in the test case above:

        my $psgi_app =  sub {
          my $responder = shift;
          my $writer = $responder->([200, ['Content-Type'=>'text/plain']]);
          $writer->write('hello');
          $writer->write('world');
          $writer->write('today3');
          $writer->close;
        };

This is a very trivial case, but it shows the basics of the interface. How do we handle this?

                else {
                    return $self->write_fh;
                }

The write_fh accessor of Catalyst::Response gives you the raw writer object which has been passed down to Catalyst from the server under which it runs. All such objects must do the write and close methods but the actual clase of the object will be specific to the server you are running Catalyst on. The writer object gets returned up to the delayed PSGI application via the responder for that case when the application does not provide the full body.

And that's the walkthrough!

Prior Art

Much of the inspiration and incentive to do this work came from Catalyst::Action::FromPSGI

Summary

Diving into Catalyst codebase can be a bit daunting, due to its age and how its changed over the years. By detailing the steps involved in extending Catalyst its our hope you will have a better understanding of the process and have an easier time should you try yourself.

Author

John Napiorkowski jjnapiork@cpan.org