Feed URIs considered harmful
October 13th, 2005 by DeWitt Clinton

You may have started seeing links in the form:

feed://example.com/content.rss

These are called “feed” URIs, and they are probably a bad idea.

Same goes for this:

rss://example.com/content.rss

And:

pcast://example.com/podcast.xml

The first part of each URI, (”feed,” “pcast,” etc.), is called a scheme. Schemes are formally defined in the Universal Resource Identifier: Generic Syntax, and in the IETF’s Network Working Group’s Guidelines for new URL Schemes. Schemes are registered with the IANA in the official registry of URI schemes. You will find familiar ones, such as http, https, ftp, and mailto. And some less commonly used ones, such as tel, nfs, sip, and a handful more.

The one thing that (almost) all schemes have in common is that they refer to network protocols. The scheme informs the client application — perhaps a web browser, perhaps an email reader, perhaps a bit of Java code — what underlying transport mechanism must be used to parse and handle the remainder of the URI. In other words, the http in http://www.unto.net/ tells your browser to use the Hypertext Transfer Protocol to access www.unto.net. If it had been https then a different protocol would be used — in this case, HTTP with SSL. The web browser knows how to speak HTTP, and HTTP+SSL, and FTP, etc., so it can perform the request itself.

But if, for example, the protocol was mailto, then the browser, not knowing how to handle that protocol, would typically ask the operating system (or some other registry) what application should be launched to handle that particular scheme. Then an email reader would be opened and the email reader would take it from there.

To get a better understanding of what is going on, think about what happens when you open a file over HTTP. If the URL is "http://example.com/index.html", then then browser makes a HTTP request, and sees in the response header that the Content-Type is text/html. The browser knows how to display HTML, so it renders the page on the screen. This works for other types the browser understands how to render internally as well, such as plain text, XML, and more.

However, sometimes the browser doesn’t get a type it recognizes. If the content comes back with a header Content-Type: application/msword, the browser will try and hand it off to someone else to handle (or fall back on saving it to disk). It is up to the particular browser and operating system to decide what to do with each different type of content.

One nice thing about the URI is that you can have many different schemes pointing to the same content. For example, http://example.com/recipes.html, https://example.com/recipes.html, ftp://example.com/recipes.html, or even gopher://example.com/recipes.html, could all retrieve the same HTML file, independent of protocol. The content-type didn’t change, just the means by which the file was accessed.

So all the scheme does is communicate to the client what type of protocol must be understood to retrieve the content. It doesn’t say how to process that content — the scheme simply informs the client what protocol must be used to fetch the file or decipher the URI.

Unfortunately, there are some misuses of URI schemes, often for perfectly understandable reasons. Because operating systems have become relatively smart about opening the right application to handle a given protocol, people are misusing the URI scheme to automatically open certain applications.

A few of the big problems with feed and similar URI schemes:

  • If a client isn’t familiar with your scheme (and most won’t be), they are completely out of luck. Users will see “invalid URL” and will not be able to retrieve the content at all.
  • These aren’t network protocols, they are content-types. Thus are redundant (with the real content-type), and obscure the real protocol.
  • Content already has a universally accepted, well known, and extensible registry of valid types.

Some sites recommend the use of the feed URI scheme. While I am certain that these recommendations don’t have bad intentions — overloading the scheme is tempting, after all — it isn’t necessarily a good idea.

Fortunately, alternatives exist, and they are built right into the technologies that need them. The first is to simply use any existing protocol, such as http, and return the appropriate Content-Type with the response. (Not all protocols support response headers, of course, but the ones used in this scenario do.) The client can then use the document content type to handle the data accordingly.

And if clients want to launch the correct application before the URL is fetched, consider using the existing standards for declaring the content type in the hyperlinks. HTML 4.01 states that the A element can contain a type attribute. The client (typically a web browser) can use this content type to pass along a URL to another application before ever trying to handle it internally.

And some people are using feed and pcast URIs in syndicated feeds. If those feeds are in the Atom syndication format, then the link tag already has a type attribute. If you are using RSS, then either send an email to Dave, (maybe he’ll help add it to the spec), or simply mix Atom link elements into the RSS specification.

While it is impossible to cover all the scenarios here, hopefully people will remember that what may initially seem like an expedient or clever solution to one problem sometimes doesn’t always take into account the whole picture. Perhaps application authors should consider backing out support for bad URI schemes in favor of more established Internet standards? I don’t think it is too late to fix this problem, but it will require a bit of education as to the reasons URIs work the way they do.

[Further reading: A thread on the W3C URI Mailing list discussed the problems with the pcast scheme after it was ferreted out in a post on macosxhints.com. And Mark Nottingham wrote a longer post about the problems with "feed:". In fact, there is a even Wiki set up to help with all the undocumented (and misued) URI schemes.]

2 Responses to “Feed URIs considered harmful”

  1. Joe Says:

    Here’s a little how-to I wrote up a while back on how to use media-types to dispatch to your application under Windows:

    http://bitworking.org/news/Atom_Auto_Sub_How_To

    -joe

  2. DeWitt Clinton Says:

    Joe — that’s fantastic, and it is just what needs to be incorporated in a standard, cross-platform way.

    And great link to the W3C Architecture of the World Wide Web, Volume One, Section 2.4 URI Schemes, too. The particularly choice quote there is:

    While Web architecture allows the definition of new schemes, introducing a new scheme is costly. Many aspects of URI processing are scheme-dependent, and a large amount of deployed software already processes URIs of well-known schemes. Introducing a new URI scheme requires the development and deployment not only of client software to handle the scheme, but also of ancillary agents such as gateways, proxies, and caches. See [RFC2718] for other considerations and costs related to URI scheme design.

    Because of these costs, if a URI scheme exists that meets the needs of an application, designers should use it rather than invent one.

    And since I’m quoting, the following section 2.4.1 on URI Scheme Registration reads:

    Unregistered URI schemes SHOULD NOT be used for a number of reasons:

    • There is no generally accepted way to locate the scheme specification.
    • Someone else may be using the scheme for other purposes.
    • One should not expect that general-purpose software will do anything useful with URIs of this scheme beyond URI comparison.

    One misguided motivation for registering a new URI scheme is to allow a software agent to launch a particular application when retrieving a representation. The same thing can be accomplished at lower expense by dispatching instead on the type of the representation, thereby allowing use of existing transfer protocols and implementations.

    Even if an agent cannot process representation data in an unknown format, it can at least retrieve it. The data may contain enough information to allow a user or user agent to make some use of it. When an agent does not handle a new URI scheme, it cannot retrieve a representation.

    You can’t say it any more clearly than that.

    But I do have to agree with the person who commented regarding RSS. I think we need to find workarounds and fixes for RSS — the format may be less than correct, but it does carry the bulk of the syndicated traffic on the net.

    I’d personally like to see people consider using Atom extensions within RSS namespaces. (Similar to what we’re doing inside the OpenSearch Response with <link> elements.) A clear proposal needs to be made here.