A New Project, Part 17


sticky, part 17



Introducing Spring 0.1, the first mini-application built on top of the Essex toolkit as part of the new project.

If you recall, Essex is a component-based toolkit designed to simplify the development of the new project. (The "new project" itself is a never ending series of excercises in the transparent development of a community oriented set of web services for sharing data.) The Spring project was started partly as a way to put Essex to the test, and partly as a way to provide a more general framework for working with OpenSearch results.

Currently Spring doesn't offer much that you can see. It is simply offering a wrapper around Amazon Web Services that translates product search results into OpenSearch RSS. However the goal is for Spring to do much more. I'm already planning on a generic OpenSearch aggregator as well as a few original columns (which I'll of course publish openly). I also plan on providing an OpenSearch wrapper around both the Google web search API and the Yahoo! web search API. Unfortunately both of those services require an API key and limit the total number of queries per-key per-day to 1000 and 5000, respectively. So I will be asking people to use their own API keys (they're free and easy to get, and I'll post instructions) if they want to save a large number of web searches in a blog reader.

Even more fun will be the SpringBoard, which will be collection of XHTML files and JavaScript code that will run on the client-side (i.e., in the browser) and use XSLT to render OpenSearch results as HTML. This would have been a challenge if not for the JavaScript XSLT parser that Google recently released as open source code. I hope to innovate a little on the search UI side here, but mostly it is an excuse to play with some AJAX / REST techniques and do it with technologies I am already familiar with (i.e., search).

You can take a look at the existing Spring code at:

http://svn.unto.net/svn/public/spring/trunk/


It's all Perl, but it's clean Perl. If you want to install it you'll need to download a recent version of Essex. I recommend pulling the version from the trunk at Essex. I'll also keep making nightly builds of the code available at spring-nightly and essex-nightly in the nightlies directory.

I'm putting some effort into making sure the code runs on multiple machines, but I have no doubt that there will be problems if/when someone else tries to install either project on their own box. At the very least it requires httpd and mod_perl, which are getting easier to install, but not trivial. (And for Spring it also requires LWP, XML::LibXSLT, Cache::Cache, and other Perl modules.) If you find something amiss, please let me know. Documentation is scarce right now -- I am intentionally waiting until things stabilize before spending too much time on it.

For those that like pretty demonstrations, you may have to wait until Spring 0.2 when I get the AJAX-ified interface up and running. In the meantime, here is a REST-ful list of Amazon Product OpenSearch Description Documents being served by this machine:

http://www.unto.net/spring/amazon/product/opensearch_description/


Here's the one for Music:

http://www.unto.net/spring/amazon/product/opensearch_description/Music


And here's a search for "Beck" in Music, suitable for saving in a blog reader (as it is built on RSS):

http://www.unto.net/spring/amazon/product/opensearch/Music/Beck


Not that exciting, right? That's just because there is no pretty user interface. But just wait -- the interface that I have in mind is even better than the existing one at http://www.unto.net/aws/.

One important thing I'm asking you to remember: Everything I'm doing here is open source and available for free. Even the license I'm releasing it under encourages people to take the code and re-use it. In return I'm asking a few simple things. First, if you use it, please honor the license and be sure to give proper attribution. This is can be as simple as linking back to unto.net from a visible place on your site. Besides, the more you do that the more likely I'll be to help you out when you need a bug fixed, so it not only feels good, it is good.

Second, I'm effectively releasing the source code for the AWS search engine that has been paying the hosting costs for unto.net. It pays the bills because each product link that people follow and subsequently purchase sends me a small referral fee. Enough people have been doing that each month to cover the hosting costs. So if you are going to build something with this code, that's great. Maybe you'd like to keep the referral ID intact for me? Maybe you're generating enough traffic such that you'd like the referral fees yourself. That's great too, but perhaps you'd consider sending me a small donation or something. Ultimately it's not a big deal in a financial sense, but I am planning on keeping unto.net advertisement free if I can, so the AWS product referrals are a way of keeping this from being an expensive hobby. Thanks!

A few more things:

The URL of the Spring APIs will be changing to spring.unto.net in the near future, so please don't get too attached.

You'll need to provide your own AWS Subscriber API key if you want to run Spring locally. But since I intend to keep this a public web service I don't expect any reason you'd need to run it locally other than to make modifications.

You can basically do everything that Spring does today with a 100-line script, rather than the thousands of lines of code shared between Spring and Essex. I'm okay with this -- the returns increase over time.

Essex is getting pretty powerful, but I continue to see room for improvement. I can do more to recycle service instances in most cases, and the next version will do a lot of this automatically.

Spring requests are relatively fast, though not lightning so. I have some ideas about where to optimize and feel I can get the average request down around 40 ms, fully loaded. Currently the overhead keeps the requests up over 100 ms. Obviously these numbers are much higher when you have to go out to the network or make an DB query. But I am keeping performance in mind, and the Essex services infrastructure will go a long way in pooling expensive resources. Besides, being fully REST-based will enable many simple queries to be cached on the client, on remote proxy servers at the ISP level, or on a service-side proxies, such as Squid.

And on a totally unrelated note that I'll write a lot more about once I get back to Orchard, I think that OpenID is the way to go for my user API. At the very least I will expose Orchard users via OpenID so the systems can interact.

I almost ditched the Perl version of Essex and Spring today. I almost did this because I downloaded Eclipse 3.1 and wanted to play around with Java 1.5, which is sexy for a programming language. Long term I think I'll probably try an implementation of the Orchard APIs in C#, as I'd like to see if I can run them on the client itself. But who knows, that could end up being any of C#, C, C++ or Java.

Anyway, stayed tuned. Much more to come.

Update 1: I changed the URLs above to accurately reflect the new API.

Update 2: Yahoo web search as OpenSearch is live. See the OpenSearch Description Document at:

http://www.unto.net/spring/yahoo/web/opensearch_description/


And make queries in the form:

http://www.unto.net/spring/yahoo/web/opensearch/open+source+rest+ajax
Update 3: Lest the point be lost -- OpenSearch is RSS. In other words, you can save and view the OpenSearch URLs in any blog reader. For example, this URL is a "saved" search for "A9":

http://www.unto.net/spring/yahoo/web/opensearch/A9


You can add this to Bloglines, for example, with:

http://www.bloglines.com/sub/http://www.unto.net/spring/yahoo/web/opensearch/A9


Or add it to My Yahoo:

http://add.my.yahoo.com/rss?url=http://www.unto.net/spring/yahoo/web/opensearch/A9


Yahoo limits each client of their web services API to 5000 searches a day. Not that this isn't generous, but if I see anyone subscribing to more than a few searches I'll probably have to block your IP address so other people can have fun, too.