A New Project, Part 4
May 16th, 2005 by DeWitt Clinton

sticky, part 4

Thanks to some very insightful comments on earlier articles in ths series, things are definitely beginning to to coalesce regarding the API. In particular the questions about REST and the friends/groups model really helps nail down some important details early on in the development process. And I encourage other people to feel comfortable adding their own opinions, questions, and comments — they are all welcome here.

While there are still some unresolved issues regarding the user model, I’d like to move on for a bit and talk about the model for the notes themselves. I basically picture a “note” as a simple piece of text. Like a Wiki, I would like it to be very easy to link one piece of text to another. In a general sense, I wouldn’t mind if that text could be formatted in various ways — bold, italics, lists, etc. I don’t think I want to expose the entire expressive power of arbitrary HTML/CSS — I worry that by embedding full HTML documents in each note we may take this application beyond a reasonable scope.

On the other hand, if we do not allow arbitrary HTML then we are forced to make a choice about what is allowed. Moreover, we would be forced to either invent our own markup syntax (the approach taken by all Wiki’s I’ve seen to date), and/or to parse and restrict the HTML markup that each note contains.

Perhaps we are better off looking at the issue of markup from the perspective of the client. If we consider the hypothetical client UI that I originally have in mind, then we are relatively flexible in our choices. Since the notes themselves will be displayed as a simple <div> of text, we can almost include any valid HTML that we’d like. However, we should not forget that the text we get back from the server will probably need to be modified before being presented to the browser: hyperlinks to other notes will need to be rewritten, images checked for sizes, etc.

How about this for a compromise? The note can contain arbitrary, but valid, XHTML 1.0 elements that can be contained within a <div>. This is an intended use of XHTML, and is covered under section 3.1.2. Using XHTML with other namespaces of the 1.0 specification.

This will allow the client to manage presentation using standard tools (such as CSS and the DOM), and move much of the burden of processing the display to the client side where it belongs. The biggest downside is that as the user writes a note they will be forced to adhere to strict XHTML constraints.

Alternately, it is possible that we could allow the the user to author notes in a predetermined alternative markup — e.g., [b]this text is bold[/b] — and perform the transformation before the note is stored. Or we could even store the original source text and return both that and the original with each note.

I’d be curious as to what people think about this. Would it be too much to ask the user to write their using XHTML tags? Or are they better served by using a markup language such as Wikipedia’s custom markup syntax. I know that as a TWiki user I’ve often felt that I’d be better off just writing my own <ul%gt;’s and <li%gt;’s whatnot. But then, if all other Wiki’s chose to use custom markup, then perhaps they were on to something.

In any case, a few things are clear about what each note needs to contain:

  • ID — or some way of referring back to it, no matter what else changes
  • Name — or really, namespace and name.
  • Text — the data itself
  • Tags — in the manner of del.icio.us and similar
  • Read Groups — as per the discussion in part 3
  • Write Groups — see above
  • Status — active, deleted, etc.
  • Original author
  • Creation date
  • Last author
  • Last modified date — necessary for managing concurrent edits

This opens up a few new questions. For one, how do we really manage concurrent access? One way is to require that every time a note is POST-ed it must contain the timestamp of the version it was modifying. This can at least give the client enough information to chose to force an overwrite if there was a concurrent modification.

And another question — since I want to preserve all historic versions for each note, should this be done by storing diffs between versions? Or should we just assume that space is cheap, and that we can always move to diffs in the future? Should the client ever send diffs itself, or should it always POST the entire text?

And should the note’s unique ID be the same as it’s name? MediaWiki solves this one by giving each entry a unique ID, plus a namespace, plus a name (title). Since it’s hard to argue with the success of MediaWiki, I’ll probably follow their lead. We will still need to figure out a way to know when an individual word or phrase in a note’s text can be linked to another note without having to round-trip to the server.

Initially, here’s what I thought the note API could look like:

  • /note/[namespace]/[name]?username=[username]&password=[password]
  • /note/[namespace]/[name]/[version]/?username=[username]&password=[password]

Additionally, I would like to add various paths under each note that would allow REST access to each of the invidual fields. I.e., calling /note/[namespace]/[name]/text/ should give you just the appropriate bit of data.

And looking back at the comments on part 3, it looks like the API might actually be better if every note lived under an individual user. So instead of prefixing each note with /note/, it would be prefixed with /user/[username]/. This would mean that each note would always expose it’s owner, but that’s not such a bad thing.

So a revised version would be:

  • /user/[username]/notes/[namespace]/[name]?username=[username]&password=[password]
  • /user/[username]/notes/[namespace]/[name]/[version]/?username=[username]&password=[password]

Clearly the paths are getting rather long, but they are consistent, which is good.

To keep everything in one place, I’ve created a static homepage for the new project. This contains pointers to all the articles and to the latest API. (Of course, as soon as I made the static pages I noticed that I can’t really defer the decision about a name for this project much longer.)

Off to work. More tonight…

9 Responses to “A New Project, Part 4”

  1. sam Says:

    A simple markup syntax that I really like (cause it’s clean, intuitive, and suggests valid xhtml) is Textile - unfortunately it doesn’t appear to be open source, but you could probably throw together something similar without too much trouble…?

  2. DeWitt Says:

    Cool — it seems like half the people I know are following along here.

    I like your suggestion — Textile’s syntax looks just about right if I want to do an alternative markup. But it occurred to me on the drive to work today that I don’t necessarily need to worry about the markup now. If I want, I can do everything with XHTML now, and rely on the client side (or an intermedia layer) to insert a markup translator later. So the backend can be fully standard, and it is up to the client to figure out the best way to present that data to the user.

    I’m really hoping that once I get a working API up and running then more people will get involved on building clients. I’m going to post more about that very thought later tonight.

  3. Jessica Says:

    Dewitt asks, “Would it be too much to ask the user to write their using XHTML tags? Or are they better served by using a markup language such as Wikipedia’s custom markup syntax.”

    It depends on who you want to use it…but since I stopped learning how to make Web sites in 2000, I vote yes — too much to ask :)

  4. Jessica Says:

    Also - this may be a totally dumb question, but do you really need concurrent write-access? We do live in the stone ages here at the newspaper, but our content management system lets us read but not write while someone else is editing. It also tells us who the annoying person is who forgot to put the document down…and lets us use some arcane instant-messaging tool to bug them if they’re not in hollering distance.

    Re: diffs — I’m not involved enough in a Wiki to have much stake in revisions, other than to casually see how much revision has happened — but I find them way less useful than the (oft-forgotten) comments users should provide on what changed when they check in a new version. I find the diffs hard to read, esp. when changes are mostly formatting.

  5. Brett Says:

    I realize in your above response you mention an interest in moving past the discussion of markup syntax, at least for now, and possibly forever. But, from an end-user perspective, I think you may be a bit hasty in pushing onto the client something that’s so intrinsic to my interaction with your project. That fact that every wiki seems to make up its own syntax has kept me from embracing any wiki.

    While I think Textile is very good, and I use Dean Allen’s Textpattern (CMS) and Textdrive (hosting), I find Textile flawed w/r/t its opaque development (single developer working alone, releasing neither early nor often), and its functionality. I have no idea if Dean ever plans to update it. Which, perhaps, is why Brad Choate felt compelled to release Textile2/

    I very much prefer Markdown, developed by John Gruber, one of the few people as qualified to create a markup syntax as Dean Allen. From the beginning, Gruber listened to others, especially Aaron Swartz, who released html2text (which turns HTML into Markdown-formatted plain text) at the same time Gruber released Markdown (something no one has yet done for Textile). In addition, Gruber has documented Markdown more completely, thoughtfully, and eloquently, than any developer I’ve ever seen. Even the mailing list is one of the better reads out there - actually, I think it shares quite a lot with unto.net’s A New Project in terms of tone and thoughtfulness. Oh, and Markdown outputs XHTML, has a BSD-style open source license, and it’s been ported to multiple languages.

  6. DeWitt Says:

    Wow — Markdown is impressive. I think I even tried it out at some point back when unto.net was running on Blosxom.

    I like everything you say about it — particularly the transparency aspect. And you’re right — I imagine that it was that transparency that led other people to port Markdown to other languages. Since I am still (intentionally) undecided about what language the server-side of this project will be using, it is nice to know that I could drop a library like Markdown in no matter what direction we go.

    Thanks a million for the pointer, Brett. If this project needs a markup language — and given your and Jessica’s comments, it seems that it does — I am completely sold on Markdown. Not so much because I personally would use it (apparently I could use it today while editing unto.net thanks to the PHP port), but because other people seem to want it.

    And if that’s not a testimonial to the immediate benefits of an open development process, I don’t know what is. Thanks again!

  7. DeWitt Says:

    Jess — good point about concurrent edits. I think you’re right, if this were a true CMS (and who knows, perhaps it is), we might be better off with a locking model for documents. But I’m worried that it would be too hard to do locking correctly. Or at least, it feels easier to simply warn the user when they are about to step on someone else’s toes. Getting locks working well in a web services application is hard to do, and something I’m inclined to punt on for this version. (Hey, at least I’m honest.)

    And vis-a-vis diffs, I’m hoping that they wouldn’t often be exposed to the end user. At least not without being explicity requested. The scenario I was thinking about was whether or not the data should be stored using diffs (a la CVS) and/or sent to and from the server via diffs (a la rsync). Other than that I think the user should pretty much deal only with full documents and have the option of pulling out historical versions when they need to.

  8. Jessica Says:

    Ohh, I didn’t realize that writing the locking part would be *more* difficult than dealing with concurrent editing. Gotcha.

    In spite of the fact that I don’t understand 99% of this discussion, I’m still going to give you my end-user 2 cents. Because I’m definitely going to be one of your end users.

  9. DeWitt Says:

    Well, to be fair, it’s not the locking that difficult. (That’s as easy as having a “lock” API that sets a flag.) It’s the unlocking part that’s a little trickier — and in knowing when a lock has been abandoned. And it’s almost a holy war among CMS aficianados (er, maybe “users” is a better word) about which approach is better. Either way, always good to have the feedback!