OpenSearch and microformats

DeWitt Clinton
June 2006

As has been observed before, microformats work particularly well in the context of syndicated search.

With that in mind, here is something that I have been working on:

A draft document on OpenSearch and microformats.

But first, some background:

Microformats exhibit a profound and subtle characteristic in that they can be used to present structured and semi-structured data in a format that degrades gracefully.

One can layer microformat markup into human-readble content such that a rich client can extract semantically meaningful data while at the same time a simple client can still parse the basic text.

Thus microformats can, in conjuction with formats and protocols such as Atom 1.0, Atom Publishing Protocol, and OpenSearch, become part of a powerful new paradigm for web APIs.

Examples speak louder than words:

Consider a trivial "white pages" phone directory search engine. This directory would publish an OpenSearch Description document to describe the search API. This document would contain URL templates that declare how to conduct a search both by free-form search terms and explicitly by first and/or last name:

<?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/"
                       xmlns:hcard="http://www.w3.org/2006/03/hcard">
   <Url type="application/atom+xml"
        template="http://example.com?q={searchTerms}"/>
   <Url type="application/atom+xml"
        template="http://example.com?first={hcard:given-name?}&amp;last={hcard:family-name?}"/>
   <!-- ... -->
 </OpenSearchDescription>

A simple search client, (i.e., one that does not know about microformats), would use the standard "searchTerms" based URL template to make an unstructured search request. Whereas a rich search client, (i.e., one that knows about the hCard microformat), would use the structured template to explicitly search by first and/or last name.

Thus an example HTTP GET request from a rich search client would look like this:

  GET http://example.com?first=Jane&last=Smith

Upon receiving such a request the search server would return an Atom-based OpenSearch Response. This search response would include an atom:entry element for each distinct matching result. Each entry would include an atom:summary element of type="xhtml". The data itself would be presented as simple XHTML marked up with the hCard microformat syntax.

An example microformat enhanced search response:

<?xml version="1.0" encoding="UTF-8"?>                                   
 <feed xmlns="http://www.w3.org/2005/Atom"                                   
       xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">              
   <title>Example.com Phonebook Search</title>                      
   <opensearch:totalResults>62</opensearch:totalResults>            
   <opensearch:startIndex>1</opensearch:startIndex>                 
   <opensearch:itemsPerPage>10</opensearch:itemsPerPage>            
   <entry>                                                                
     <title>Jane Smith</title>                                      
     <link href="http://example.com/people/jsmith"/>                      
     <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>         
     <updated>2003-12-13T18:30:02Z</updated>                        
     <summary type="xhtml">                                               
       <div xmlns="http://www.w3.org/1999/xhtml" class="vcard">           
          <div class="n">                                                 
            <span class="honorific-prefix">Ms.</span>               
            <span class="given-name">Jane</span>                    
            <span class="family-name">Smith</span>                  
          </div>                                                          
          <div class="tel">                                               
            <span class="type">Work</span>:                         
            <span class="value">(212) 555-0101</span>               
          </div>                                                          
        </div>                                                           
     </summary>                                                           
   </entry>                                                               
   <!-- ... -->                                                           
 </feed>

A "dumb" client, such as a web browser or a normal blog reader, would simply display the entries to the user and render them as plain XHTML like this:

Ms. Jane Smith
Work: (212) 555-0101

A standard OpenSearch client would even let the user page through the results. Perfectly useful, though nothing that takes advantage of the underlying meaning of the data.

But a smart client, one that knows about hCard, could offer to automatically dial the phone number, store the results in an address book, pull up matching matching emails, display a picture, or something else that requires knowledge of the semantic meaning of the data itself.

Quite a range of capabilities, especially for something that is built entirely on existing standards and formats.

You can even start using this model today. Because it degrades gracefully one does not need to wait for widespread adoption.

And this is just the "read" side of things. To support "read/write" you will want to look at the Atom Publishing Protocol. This technique gets exponentially more powerful, but not more complicated, when you incorporate the full richness of REST with the power of search and the expressiveness of microformats.

With this simple and orthogonally layered model one can design web service APIs that work across a wide range of clients. The "degrade gracefully" issue has traditionally been a hard problem for structured XML data and WS* to solve; this could be one alternative way around it.

But to be clear: this approach will not, and is not intended to, replace structured web APIs.

As good as it is, OpenSearch + Atom 1.0 + microformats will probably bump into a ceiling at some point. While it works great for humans and great for some rich application scenarios, it isn't designed to do everything a fully structured API can.

(That said, you may be able to raise that ceiling by incorporating bits of XQuery or CQL on the request side, and bits of structured XML, such as GData, in the response.)

The question is -- can this hit the 80%-case sweet spot that makes it easy to write, easy to read, and frankly, be just good enough?

Keep in mind that this is all still a work in progress. I very much solicit your input and feedback. I will move the actual document over to a permanent home soon. For now I just wanted to get it out to the public to hear what people think.

A large group deserves mention for helping with this, including Jon Gold, Hugh Ha, Joel Tesler, Joe Gregorio, James Snell, Michael Fagan, Kyle Marvin, Niall Kennedy and others. Please send them your thanks if you like these ideas. If you don't, then please blame me.

And Tantek, I would have bugged you about this too, but you've clearly been busy showing us how cool microformats and search can be. Any interest in adding an OpenSearch interface so we can prove just how powerful syndicated search and microformats really are?

Again, here it is:

A draft document on OpenSearch and microformats.