Microblogging syndication formats


I've been thinking a little bit about the syndication formats for microblogs lately, and I'm wondering if we might all agree upon some very simple conventions to integrate the various providers. Nothing profound here, just some simple ideas about how to get an easy win with existing technologies.

Taking FriendFeed as the best current example of an aggregation site, I notice that the FriendFeed team has hand-coded various content providers, but they don't have the same richness enabled for generic feeds.

For example, take Dave Winer's stream on FriendFeed. His FriendFeed page looks pretty darn good, doesn't it? (I pick Dave because he does more to syndicate various content sources than just about anyone.)

But then notice that his native FriendFeed bookmarks, his Amazon wishlist, and his Flickr photos all have rich content syndicated and look stunning, but his basic blog feed -- itself a very rich content source -- is comparably lacking in how it is represented in the aggregated stream.

It's the difference between this:

Dave Winer's FriendFeed stream, full of rich information and images from the bookmarklet

And this:

Dave Winer's current FriendFeed stream, very plain, without rich information from his RSS feed

Why is this? Dave's blog, syndicated over RSS, has plenty of great data to display. Only the blog post title is displayed -- missing are the images, an icon, a summary, and more. Couldn't those be captured as well?

As a strawman proposal, what if content aggregation sites agree to display the following standard elements from RSS feeds:

<?xml version="1.0"?>
<rss version="2.0">
 <channel>
  <title>Scripting News</title><link>http://www.scripting.com/</link>
  <description>
    Dave Winer&apos;s weblog, started in April 1997, 
    bootstrapped the blogging revolution.
  </description>
  <language>en-us</language>
  <copyright>Copyright 1997-2008 Dave Winer</copyright>
  <pubDate>Fri, 15 Aug 2008 07:00:00 GMT</pubDate>
  <lastBuildDate>Fri, 15 Aug 2008 20:35:00 GMT</lastBuildDate>
  <docs>http://cyber.law.harvard.edu/rss/rss.html</docs>
  <generator>OPML Editor v0.73</generator>
  <managingEditor>scriptingnewsmail@gmail.com</managingEditor>
  <webMaster>scriptingnewsmail@gmail.com</webMaster>
  <item>
   <title>Perfect timing!</title><link>
     http://www.scripting.com/stories/2008/08/15/perfectTiming.html
   </link>
   <guid>
     http://www.scripting.com/stories/2008/08/15/perfectTiming.html
   </guid>
   <comments>
     http://www.scripting.com/stories/2008/08/15/perfectTiming.html#disqus_thread
   </comments>
   <description>
      I just read this [...]
   </description><enclosure url="http://www.scripting.com/mp3s/weatherReportSuite.mp3" 
     length="12216320" type="audio/mpeg" />
   <pubDate>Fri, 15 Aug 2008 18:23:10 GMT</pubDate>
  </item>
 </channel>
</rss>


From this feed we'd want to consider the following RSS elements:



And for the Atom case the corresponding elements would be:



For bonus points in both cases, parse the HTML content of the rss/channel/item/description or feed/entry/summary elements for <img/> elements, extract them, and display them as images along with the post.

And for a custom icon, use the host's /favicon.ico.

Put all that together and we'd have the following, all without any manual intervention on the aggregator's side:

Dave Winer's FriendFeed steam, enhanced with rich data from his RSS feed

This is just the beginning -- I feel I'm only scratching the surface of what can be extracted from existing syndication formats. For example, comment stream aggregation (via the comments element or RFC 4685 autodiscovery) is a great next step after this. And I only call out FriendFeed because they're the best at aggregating multiple content sources, but these concepts apply to any content aggregator, and finding a way to reuse existing formats like RSS and Atom to create rich presentations automatically will enable us to do more with less manual work between aggregators and publishers.

Also note that this isn't intended to replace a standard format for activity streams. Those are something slightly different, and I know efforts are underway to define conventions there. Rather, this is simply an effort to get more out of the syndication formats we already publish, and in doing so, allow for aggregators like FriendFeed to present a richer experience to their users without requiring as much manual tuning per content source.

Thoughts?

Update:

Bret notes:

We do pull out images and videos for entries that use Media RSS, but those tend to be video and photo sites (the spirit of the Media RSS format is that the post *is* a photo, as opposed to the use case highlighted here, where the picture is simply a thumbnail or accompanying a larger post).


Good point, and sounds like a great thing to do.