A New Project, Part 13




[Part 13 in a series of articles on a new project on the transparent development of a distributed wiki-style note taking system.]

A dispatcher, in the design pattern sense, is a entity that can interpret an incoming request and relay it to an appropriate handler. For a command line application a dispatcher might parse the command line arguments and recognize the various switches and parameters to invoke the proper code. For example, take this command:

$ svn status -u -v --username=john


The first thing the svn binary might do is use the getopt (3) to parse out the -u, -v, and --username parameters. Then a dispatcher might be used to understand that the status command means that a "status" method or module needs to be executed in order to handle the user's request. (Note that I haven't looked at the subversion code, this is just a hypothetical example.)

The mapping of arguments to handlers might be hardcoded into the application, as is almost always the case with compiled C and C++ binaries, or it may be determined at run-time, as is sometimes the case in Perl, Python, and PHP web applications or in the web server itself.

For instance, consider the mapping of URLs to a particular handler within the web server. How does the web server know that a path of /webstats/ means to run a log analysis CGI script, vs. /images/logo.png, which means to serve up an image? In Apache httpd's case, the mapping takes place in the httpd.conf file using <Location/> configuration elements (or in more convenient but limited form, <ScriptAlias/> does a similar thing). For example, the AWS OpenSearch application is dynamically dispatched via the configuration:

   <Location /aws>
      SetHandler perl-script
      PerlResponseHandler  AWS::OpenSearch::ModPerlApplication
   </Location>


This <Location/> directive informs the web service to dispatch all incoming requests that start with /aws to the handler called perl-handler (part of mod_perl), which in turn loads the module located at AWS::OpenSearch::ModPerlApplication.

But what if the web application itself wants to invoke different code depending on the URL? For example, what if the URL /unto/category/work/ needed to execute a different module to return archived posts than the URL /unto/gallery/album/remainders does to return a photogallery? In the case of WordPress the mapping is done entirely with a combination of placing .php files at specific places in the filesystem hierarchy and using mod_rewrite rules in .htaccess files.

An alternative approach is to use a single handler for all requests to a specific location (such as /orchard/) and execute one application to handle them all. This application can then use a built-in dispatcher to interpret each incoming request to determine which module within that application should be invoked to handle the request. This approach has a few advantages (and a few disadvantages). If you allow the web application to dispatch it's own subrequests, then you greatly simplify the responsibilities of the server that invokes it. In fact, the web server itself may only have to do as much as in the AWS example above before passing along further duties. Second, you are then free to write any path parsing logic you feel your application requires, even going beyond the considerable power of mod_rewrite when necessary. You are also unbound by any constraints about where on the filesystem the handlers need to live. And if the dispatcher is aware of its own execution context, you can even write dispatchers that can be used in many contexts, from web application to command line, to invoke the same physical handlers.

More than anything else, implementing REST properly is an exercise in effective dispatching. The dispatcher, whether it be in the web server itself, an apache module, a Perl script, or whatever, needs to be able to correctly interpret both the path of the URL and the method by which it was invoked. Following the lead of Java Servlets, an easy way to manage the differences between request methods (such as HEAD, GET, POST, etc), is to supply individual methods for each (such as do_head(), do_get(), do_post(), etc). But dispatching according to path presents its own special challenge.

Consider the differences between the following REST-like URL paths:

GET /orchard/users/april/groups/friends
PUT /orchard/users/becky/notes/personal/todo/shopping-list
POST /orchard/users/cara/invite/deborah@example.com


In the first case, the server needs to execute the Orchard application and return the list of friends for April. In the second case, the server needs to store a new note under Becky's "personal" and "todo" namespace. And in the third case, it says that Cara wants to invite her friend Deborah to join the network. If everything were stored on a filesystem, then it actually wouldn't be that hard. The paths are independent and hierarchical -- the default Apache filesystem handler could map each to a unique location on disk. But, as we discussed at length in part 6, the filesystem alone is insufficient for managing the richness of the API.

So how can we dynamically map each of those URLs to a specific and appropriate handler? If we are careful (more on this in a moment), we can define certain patterns that interpret each path and invoke the proper code. Ideally, this mapping can be done via a simple configuration file. And if we are clever the dispatcher can even manage to parse out the invidual parameters for the handler in a generic and predictable way.

Imagine a configuration file that contains:

dispatchers:
   -path:/users/[username]
    handler: Unto::Orchard::UserHandler
   -path: /users/[username]/groups/[group]
    handler: Unto::Orchard::GroupHandler
   -path: /users/[username]/notes/[note]
    handler: Unto::Orchard::NoteHandler
   -path: /users/[username]/invite/[email]
    handler: Unto::Orchard::InvitationHandler


If the dispatcher is smart enough, it should be able to use that configuration file to build up a mapping table that can examine each incoming request path and invoke the appropriate handler. (In the case of Orchard, those handlers are simply Essex Services, and they are instantiated via the standard ServiceManager.) The dispatcher can then execute the appropriate do_[method] routine within each or fail with a reasonable HTTP error code (501 Not Implemented, perhaps?). Even more, the dispatcher ought to take the time to build a map of the supplied [key] patterns and provide them to the handler along with the request.

Care still needs to be taken, though. In the example above, we do not want the UserHandler to intercept all calls under /orchard/users, as that would prevent the GroupHandler or the InvitationHandler from ever seeing a request. In general we want the longest matching pattern (not in terms of elements, but in terms of path elements) to process the request. Our biggest worry is ambiguity -- attention needs to be given to how we construct our REST paths. If we allow a "wildcard" pattern to live at the same space in a path as a predefined element (e.g., if we allowed both /users/[username]and/users/search), then we will have a difficult or impossible time dynamically resolving the proper handler.

I'm working right now on adding a generic dispatcher to the Essex library. The code itself is surprisingly straightforward, but I am running up against the problem of making it flexible enough such that I can supply both a CommandLineDispatcher and a ModPerlDispatcher that can read the same configuration file and execute the same services, but do it from completely different contexts. And I am still stubbornly hoping to continue the development on OS X, which means that I need to get mod_perl working on my PowerBook. (Which itself might not be that hard, but OS X 10.4 comes with httpd 1.3, whereas I am planning to use httpd 2.0 and mod_perl 2). I'll keep people updated on the progress -- this is one of the last requirements before work on Orchard can really begin.