OpenSearch 1.1, Internet Explorer 7, Web 2.0, and the point of it all


Web 2.0



Microsoft's Chris Wilson was the first to publicly reveal one of the projects that I have been working on. During his presentation at the Professional Developers Conference, Chris announced that Internet Explorer 7 will use A9's OpenSearch to integrate third-party search engines directly into the browser. You can read the details over on Microsoft's blog and on my post on the A9 Developer Blog. Working with these guys is a pleasure -- they have definitely impressed me with their attitudes and insights.

And that's not the only news regarding OpenSearch. The new specifications are syndication format agnostic, insofar as the OpenSearch extensions can now be used with any XML-based syndication feed, from RSS 2.0 to Atom 1.0. This is a huge win, as syndication is not now, nor will ever be, a completely solved problem. And while I wondered recently whether Atom would add enough value to entice people to switch, I now believe that yes, in fact it does add more than enough value, and people will be adopting Atom as the syndication format of choice for advanced applications.

Additionally, OpenSearch 1.1 is designed to be extensible, and to restrict the core specification to only those critical bits of data that are essential to nearly all search applications. Over the past six months I've heard from literally hundreds of people with ideas on where they see opportunities for OpenSearch. Nearly all are great ideas, and some ended up in the core specifications (the built-in support for I18N, for example). But other ideas are quite specialized -- ideas that offer a tremendous return for a particular niche audience. With this in mind, OpenSearch 1.1 added a formal extensibility mechanism for all three areas -- the request, the response, and the description all incorporate ways in which OpenSearch can be applied in specialized applications.

But to step back from the technical for a moment, what does OpenSearch do in a larger sense?

Perhaps it all goes back to the idea behind Web 2.0 -- decentralizing the control of information. (Please take a moment to re-read that post for context if you have a moment to spare.) The Internet itself (as exemplified in Web 1.0) has already done a tremendous job in decentralizing the sources of information. In fact, this trend has been so pronounced that within a few short years the amount of information quickly outpaced our ability to find it. The first generation of search engines eventually caught up and started indexing the content, but it took years before the second generation did an adequate job of helping the user find information.

The problem is that an "adequate job of helping the user find information" is subjective. Even if one could imagine a scenario in which all information is indexed, there are still inherent algorithmic and explicit editorial and commercial biases that shape what results, and in what order those results, are returned to the user. These biases are the crux of the of the political aspects of search problem, the limitations that must be overcome to truly "give up control" and set the information free.

Bill Gates recently said that Google "has a slogan that they are going to organize the world's information," whereas Microsoft's "slogan is that we are going to give people tools to let them organize the world's information." While it is true that Google has said that their mission is "to organize the world's information and make it universally accessible and useful," one would be forgiven for feeling a healthy skepticism regarding the motives behind Gates' words. That said, I couldn't agree more with the sentiment. For if search is centralized, then information itself may as well be centralized.

A search engine monolopy would be as bad as a monopoly on the information itself. I'm not saying that a search engine monopoly is probable, nor even necessarily possible, but it is certainly worthy of more than just academic consideration. Just as free press and free speech are requirements for true freedom, an open market for search is required for a freedom of information. In fact, the potential for bias in centralized search is so great that there are inherent dangers in even a relative dominance in the space . For example, imagine if one search engine had 75% of the market, and that search engine algorithmically determined that "happy" or "positive" search results directly corrolated with a greater number revenue-generating ad impressions, and thus began supressing "bad" or "negative" results. How different the world would appear if dissent was eliminated in our primary conduit to information? Algorithmically biased or editorially biased it is all the same -- censorship is tremendously dangerous if it is allowed to influence a sufficiently large enough percentage of the population. Like or or not, search engines increasingly have the potential to create the truth, not just discover it.

OpenSearch -- or if not OpenSearch itself, then a technology with similar goals -- may offer part of the solution to the problem of the centralization of search. If we recognize search as the primary entry point into information, then we can also agree that decentralizing the search furthers the goal of decentralizing the information. If the search interfaces can be universally exposed, and the search results universally syndicated, then the issue of centralized control is partially mitigated. To that end, OpenSearch offers a common format that enables search engines to publish their interfaces and syndicate their search results.

One can then envision a scenario in which all the world's information is indexed, not just by one search engine, but by many, all syndicating their results. (And search engines include not only general-purpose ones that index the surface web, but also engines that index specialized data repositories, and personalized search engines that index your own private data, such as email and other documents.) If one can build a smart enough client, then a given search can be farmed out to the appropriate search engines, the results aggregated and collated, and the diverse set of underlying information presented to the user.

The difference between a galaxy of independent but interoperable search engines and a handful of all-encompassing über search engines is that control is that the galaxy model distributes control outward, whereas the traditional model folds it inward. Since the crux of Web 2.0 is the distribution of control, clearly the galaxy model is more in tune with the current vogue of web theory. With the inclusion of OpenSearch inside Internet Explorer, we are one step closer to the realization of that paradigm.

Granted, this is hardly a solved problem. OpenSearch is just one small part of the solution. In fact, OpenSearch itself is limited -- for while it is designed to be extensible and flexible, no one technology can ever encompass the richness of all types of search (for example, query languages such as XQuery or SQL address complex search problems themselves). But the problem goes far beyond simple search syndication. If you are seeking a perfect solution, then even aggregation is an intractable challenge . It is theoretically impossible to query all available search engines in parallel, and thus the aggregator needs to make smart decisions about what should be queried, either algorithmically or at the user's behest. And collation (or in this case, global relevancy) is equally unsolved -- the top result from one search engine may be ranked poorly or missing altogether in another. So while search syndication is a necessary part of the solution, it is hardly the silver bullet that fells the beast of centralized search. (Fortunately, these problems will keep many of us occupied for years to come.)

Ultimately search syndication and the like are all just small stones played one at a time across a massive board. For every move that shifts power outward (peer-to-peer technologies, HTTP, OpenSearch), there is another that pulls it back in (Digital Rights Management, the DMCA, the Patriot Act). For all of the frustration that one could rightly feel for the state of affairs today, it is a source of comfort that there are still some moves yet to be played.