Feed URIs considered harmful

DeWitt Clinton
October 2005

You may have started seeing links in the form:

feed://example.com/content.rss

These are called "feed" URIs, and they are probably a bad idea.

Same goes for this:

rss://example.com/content.rss

And:

pcast://example.com/podcast.xml

The first part of each URI, ("feed," "pcast," etc.), is called a scheme. Schemes are formally defined in the Universal Resource Identifier: Generic Syntax, and in the IETF's Network Working Group's Guidelines for new URL Schemes. Schemes are registered with the IANA in the official registry of URI schemes. You will find familiar ones, such as http, https, ftp, and mailto. And some less commonly used ones, such as tel, nfs, sip, and a handful more.

The one thing that (almost) all schemes have in common is that they refer to network protocols. The scheme informs the client application -- perhaps a web browser, perhaps an email reader, perhaps a bit of Java code -- what underlying transport mechanism must be used to parse and handle the remainder of the URI. In other words, the http in http://www.unto.net/ tells your browser to use the Hypertext Transfer Protocol to access www.unto.net. If it had been https then a different protocol would be used -- in this case, HTTP with SSL. The web browser knows how to speak HTTP, and HTTP+SSL, and FTP, etc., so it can perform the request itself.

But if, for example, the protocol was mailto, then the browser, not knowing how to handle that protocol, would typically ask the operating system (or some other registry) what application should be launched to handle that particular scheme. Then an email reader would be opened and the email reader would take it from there.

To get a better understanding of what is going on, think about what happens when you open a file over HTTP. If the URL is "http://example.com/index.html", then then browser makes a HTTP request, and sees in the response header that the Content-Type is text/html. The browser knows how to display HTML, so it renders the page on the screen. This works for other types the browser understands how to render internally as well, such as plain text, XML, and more.

However, sometimes the browser doesn't get a type it recognizes. If the content comes back with a header Content-Type: application/msword, the browser will try and hand it off to someone else to handle (or fall back on saving it to disk). It is up to the particular browser and operating system to decide what to do with each different type of content.

One nice thing about the URI is that you can have many different schemes pointing to the same content. For example, http://example.com/recipes.html, https://example.com/recipes.html, ftp://example.com/recipes.html, or even gopher://example.com/recipes.html, could all retrieve the same HTML file, independent of protocol. The content-type didn't change, just the means by which the file was accessed.

So all the scheme does is communicate to the client what type of protocol must be understood to retrieve the content. It doesn't say how to process that content -- the scheme simply informs the client what protocol must be used to fetch the file or decipher the URI.

Unfortunately, there are some misuses of URI schemes, often for perfectly understandable reasons. Because operating systems have become relatively smart about opening the right application to handle a given protocol, people are misusing the URI scheme to automatically open certain applications.

A few of the big problems with feed and similar URI schemes:

If a client isn't familiar with your scheme (and most won't be), they are completely out of luck. Users will see "invalid URL" and will not be able to retrieve the content at all.
These aren't network protocols, they are content-types. Thus are redundant (with the real content-type), and obscure the real protocol.
Content already has a universally accepted, well known, and extensible registry of valid types.

Some sites recommend the use of the feed URI scheme. While I am certain that these recommendations don't have bad intentions -- overloading the scheme is tempting, after all -- it isn't necessarily a good idea.

Fortunately, alternatives exist, and they are built right into the technologies that need them. The first is to simply use any existing protocol, such as http, and return the appropriate Content-Type with the response. (Not all protocols support response headers, of course, but the ones used in this scenario do.) The client can then use the document content type to handle the data accordingly.

And if clients want to launch the correct application before the URL is fetched, consider using the existing standards for declaring the content type in the hyperlinks. HTML 4.01 states that the A element can contain a type attribute. The client (typically a web browser) can use this content type to pass along a URL to another application before ever trying to handle it internally.

And some people are using feed and pcast URIs in syndicated feeds. If those feeds are in the Atom syndication format, then the link tag already has a type attribute. If you are using RSS, then either send an email to Dave, (maybe he'll help add it to the spec), or simply mix Atom link elements into the RSS specification.

While it is impossible to cover all the scenarios here, hopefully people will remember that what may initially seem like an expedient or clever solution to one problem sometimes doesn't always take into account the whole picture. Perhaps application authors should consider backing out support for bad URI schemes in favor of more established Internet standards? I don't think it is too late to fix this problem, but it will require a bit of education as to the reasons URIs work the way they do.

[Further reading: A thread on the W3C URI Mailing list discussed the problems with the pcast scheme after it was ferreted out in a post on macosxhints.com. And Mark Nottingham wrote a longer post about the problems with "feed:". In fact, there is a even Wiki set up to help with all the undocumented (and misued) URI schemes.]