A New Project, Part 4

DeWitt Clinton
May 2005

sticky, part 4

Thanks to some very insightful comments on earlier articles in ths series, things are definitely beginning to to coalesce regarding the API. In particular the questions about REST and the friends/groups model really helps nail down some important details early on in the development process. And I encourage other people to feel comfortable adding their own opinions, questions, and comments -- they are all welcome here.

While there are still some unresolved issues regarding the user model, I'd like to move on for a bit and talk about the model for the notes themselves. I basically picture a "note" as a simple piece of text. Like a Wiki, I would like it to be very easy to link one piece of text to another. In a general sense, I wouldn't mind if that text could be formatted in various ways -- bold, italics, lists, etc. I don't think I want to expose the entire expressive power of arbitrary HTML/CSS -- I worry that by embedding full HTML documents in each note we may take this application beyond a reasonable scope.

On the other hand, if we do not allow arbitrary HTML then we are forced to make a choice about what is allowed. Moreover, we would be forced to either invent our own markup syntax (the approach taken by all Wiki's I've seen to date), and/or to parse and restrict the HTML markup that each note contains.

Perhaps we are better off looking at the issue of markup from the perspective of the client. If we consider the hypothetical client UI that I originally have in mind, then we are relatively flexible in our choices. Since the notes themselves will be displayed as a simple <div> of text, we can almost include any valid HTML that we'd like. However, we should not forget that the text we get back from the server will probably need to be modified before being presented to the browser: hyperlinks to other notes will need to be rewritten, images checked for sizes, etc.

How about this for a compromise? The note can contain arbitrary, but valid, XHTML 1.0 elements that can be contained within a <div>. This is an intended use of XHTML, and is covered under section 3.1.2. Using XHTML with other namespaces of the 1.0 specification.

This will allow the client to manage presentation using standard tools (such as CSS and the DOM), and move much of the burden of processing the display to the client side where it belongs. The biggest downside is that as the user writes a note they will be forced to adhere to strict XHTML constraints.

Alternately, it is possible that we could allow the the user to author notes in a predetermined alternative markup -- e.g., [b]this text is bold[/b] -- and perform the transformation before the note is stored. Or we could even store the original source text and return both that and the original with each note.

I'd be curious as to what people think about this. Would it be too much to ask the user to write their using XHTML tags? Or are they better served by using a markup language such as Wikipedia's custom markup syntax. I know that as a TWiki user I've often felt that I'd be better off just writing my own <ul%gt;'s and <li%gt;'s whatnot. But then, if all other Wiki's chose to use custom markup, then perhaps they were on to something.

In any case, a few things are clear about what each note needs to contain:

ID -- or some way of referring back to it, no matter what else changes
Name -- or really, namespace and name.
Text -- the data itself
Tags -- in the manner of del.icio.us and similar
Read Groups -- as per the discussion in part 3
Write Groups -- see above
Status -- active, deleted, etc.
Original author
Creation date
Last author
Last modified date -- necessary for managing concurrent edits

This opens up a few new questions. For one, how do we really manage concurrent access? One way is to require that every time a note is POST-ed it must contain the timestamp of the version it was modifying. This can at least give the client enough information to chose to force an overwrite if there was a concurrent modification.

And another question -- since I want to preserve all historic versions for each note, should this be done by storing diffs between versions? Or should we just assume that space is cheap, and that we can always move to diffs in the future? Should the client ever send diffs itself, or should it always POST the entire text?

And should the note's unique ID be the same as it's name? MediaWiki solves this one by giving each entry a unique ID, plus a namespace, plus a name (title). Since it's hard to argue with the success of MediaWiki, I'll probably follow their lead. We will still need to figure out a way to know when an individual word or phrase in a note's text can be linked to another note without having to round-trip to the server.

Initially, here's what I thought the note API could look like:

/note/[namespace]/[name]?username=[username]&password=[password]
/note/[namespace]/[name]/[version]/?username=[username]&password=[password]

Additionally, I would like to add various paths under each note that would allow REST access to each of the invidual fields. I.e., calling /note/[namespace]/[name]/text/ should give you just the appropriate bit of data.

And looking back at the comments on part 3, it looks like the API might actually be better if every note lived under an individual user. So instead of prefixing each note with /note/, it would be prefixed with /user/[username]/. This would mean that each note would always expose it's owner, but that's not such a bad thing.

So a revised version would be:

/user/[username]/notes/[namespace]/[name]?username=[username]&password=[password]
/user/[username]/notes/[namespace]/[name]/[version]/?username=[username]&password=[password]

Clearly the paths are getting rather long, but they are consistent, which is good.

To keep everything in one place, I've created a static homepage for the new project. This contains pointers to all the articles and to the latest API. (Of course, as soon as I made the static pages I noticed that I can't really defer the decision about a name for this project much longer.)

Off to work. More tonight...