A New Project, Part 10
May 28th, 2005 by DeWitt Clinton

sticky, part 10

I just wanted to cover a handful of miscellaneous details regarding the new project. Each of these has the possibility to be a bit contentious, so I’ll try to outline my reasoning the best I can.

Source Control: Having a source control system implies that there will indeed be source code, so the mere fact that I’m ready to discuss it is a good sign. In fact, the discussion will be brief — I’ve decided to go with Subversion. While the reasons for the choice are long, the major benefits of Subversion are that (a) it is free and opensource, (b) it has a large, active community, (c) it bears enough similarity to cvs to be familiar to most people, (d) it is well documented, (e) it is feature rich, (f) it is stable, having now just reached version 1.2. I actually wanted to go with BitKeeper as the distributed repository model is pure genius, but the licensing issues scared me away, though if this were a commercial endeavor I would serious consider that route.

I’ve now installed a public instance of the Subversion server over at svn.unto.net. I’m using the moddavsvn plugin for Apache 2, and I’m thus far very impressed with how it works. This repository will be open for anyone to read, and I will put the majority of code for the new project in there. Write access will be restricted to me for now, but if there is a reason, I’ll open that up as well. In the meantime I will be happy to review and apply patches for people that would like to suggest a change. In other words, I’ll be opting for the benevolent dictator model, whereby I fill the role until such time as it makes sense to hand it off. Which is all very pretentious for a project that may or may not ever get off the ground. Just planning ahead.

For those that do not want to install Subversion — it’s easy however, and there are pre-built clients for all major systems — I will also be providing a nightly tar.gz and zip file for your enjoyment.

Naming: I still don’t know what to officially call the project. However, as soon as you start adding code you need to at least have something to refer to it by. Rather than try and pick just the right name today, I’m going to just pick a naming convention for codenames, and use those codenames for first versions of the various components (the backend, the frontend, etc.). They will probably be something boring like the moons of Jupiter, neighborhoods in Manhattan and Brooklyn, or towns in Northern California or similar. So don’t freak if you are checking out the source code for “ganymede” or “redhook” — it’s only for development purposes.

Versioning: Everything will be strongly versioned. The source will be versioned and tagged, each release will be versioned and publicly archived. But more importantly, the API itself will be versioned. This is something I think Amazon Web Services has done a fantastic job with, and I intend to support versioning as much as I can.

Database: Since no one objected during the discussion of the data model, I am going to use a relational database as the backend for this system for the time being. I’m keeping partitioning in mind, and trust me, I’ll be looking closely at more scalable and efficient ways to handle data. I’m going to go with MySQL 3.23, rather than the more advanced MySQL 4.1 or PostgreSQL. The reason for this is that I don’t feel right now that I actually need the real database functionality of transactions and triggers and views, and I may as well just choose the least common denominator. That said, all data access and the actual data model will be hidden behind the API, and the backend supporting the API will try to keep the specifics of the database interaction well abstracted.

License: The content, such as these articles and the documentation, will be release under a Creative Commons license. The source code will be released under some open source license, though I don’t know which one. The GPL is a good first choice, as it does do some of the things I want when other people (especially corporations) use the code in their own products. If I expected to be the only author, I’d probably choose that, as I could always re-license the code under something more relaxed later on if need be. However, if other people contribute to the project then I will not have the option of changing the license, and I need to pick better from the outset. I doubt people would be willing to assign their copyright to me when they commit (nor should they have to), so it will take a little more thinking. No matter what, it will be a recognized Open Source license — I’m not going to make up my own license just for this project.

Language: Potentially the most contentious decision, I’ve decided to write the backend layer in Perl. Never an easy choice, I wrote about some of these considerations at length in On Choosing An Enterprise Language, and I recommend at least reading through article for some background. More than that, I’ve decided to use modperl rather than CGI, FastCGI, or other integration tools. I also wrote on rediscovering modperl last August, and since then have only been more impressed by how good the project continues to be, especially now that it has officially reached version 2. I ported the AWS OpenSearch project over to mod_perl in an hour, for example and immediately realized huge gains.

Both PHP and Java were strong alternatives, but the balance tipped toward Perl because (a) mod_perl and apache 2 are a very, very powerful platform, particularly when one wants to support the REST HTTP operations, (b) I already know it inside and out, (c) the unto.net infrastructure is already well suited for it, (d) I ran a company for a while that had some excellent open source enterprise Perl libraries that I am familiar with and want to reuse, (e) just about everything, from MySQL to Berkely DB to Markdown has mature Perl bindings, (f) XS support will allow for native compiled code and library support if need be, (g) I can write it in such a way that it can be ported to other languages, and (h) I had to pick something.

Releases: Early and often. Or continuously if I can. I will likely start with some of the behind the scenes work that you can’t see (like request dispatchers and database connection management), but hope to move quickly toward getting working API calls out there. I am going to start with the user model, as that has the most general applicability, even long before support for notes is complete. I will try to work out a rich enough backend before even starting on the real client-side application. I will however write a server-side “client” application to demo the features as they come online.

Community: I will probably add a forum system to unto.net, such as bbPress, once this gets underway. I’ll add a mailing list as well if people are interested. I’d like to use a good group system like Google Groups of Yahoo Groups, but frankly, I’m not sure how much I want to host on other people’s servers, particularly if either of those companies might end up being a good future partner to support this project.

How you can help: I love the help I’ve received so far in the form of comments on the blog posts. For now, that’s a great way to contribute. In the future you could help by:

  • Emailing me. Suggestions, criticism, vegetarian recipes, or just good old moral support is always appreciated.
  • Comment on the blog. Like I said, it’s the best way to share. Other people read those posts and it builds community.
  • Spread the word. If you have a blog or a technical website or write for a tech journal, please let people know what’s going on here. The more the word gets out the more likely we are of seeing this project finished.
  • Download the code. The more people that try downloading it and building it the better.
  • Review the code. Code reviews rule. If you ever see buggy or unclear code, please let me know.
  • Write code. Submit patches, start your own repository, use svn.unto.net, anything goes.
  • Use the backend. As soon as there is an API to call, write an application that uses it.
  • Volunteer. Two areas that I know I’m going to need help with are system adminstration (this could get very hot if the project takes off) and graphic design.
  • Donations. I’m not looking for them at this time. If you are an individual, please consider one of the above ways to contribute instead.
  • Sponsorship. I’m in a funny position in that I love my full-time job. I’d work there even if I didn’t want a paycheck. But if a corporation wanted to sponsor the hosting or whatnot and really believed in the transparent philosophy of the project, I’d be open to a conversation.
  • Patience and understanding. I’m asking for a lot of it — thank you for everything so far.

As always, thanks for reading. I’ll keep posting as much as I can as the development continues.

7 Responses to “A New Project, Part 10”

  1. Michael Chanin Says:

    Haven’t used it personally, but Trac looks pretty good —

    http://projects.edgewall.com/trac/

  2. Todd Says:

    If you want something more heavyweight than TRAC, JIRA and Confluence are both free for open source projects, and both are excellent. The author of the blog software I use keeps a public JIRA server so users can submit bugs and feature requests (http://www.simongbrown.com/jira).

    http://atlassian.com/software/jira/

    http://atlassian.com/software/confluence/

  3. Brett Says:

    As interesting as the application promises to be – and this series wouldn’t be worth reading if The Monster at the End of this Book didn’t promise to be Groverlicious – I think reading the reasons behind your choices are what makes all the transparency worthwhile. So, at the risk of asking you to travel down roads already too well trodden:

    In Part 6 you made a well reasoned case for using a relational database, but I’d like to read more on why you’ve chosen MySQL 3.23 over MySQL 4+ or PostgreSQL.

    I think it would be interesting to see you compare Perl to Python, Ruby (especially Rails), and PHP for this project. Yes, this could be contentious, but then you’ve said many time that you want comments… by stating your reasons for not choosing Python/Ruby/PHP, I suspect you’ll elicit comments that will be useful for you and many other developers deciding between the dominant scripting languages.

    When you get to the point where you need to choose a software license, I hope you’ll write something similar to Please stand by in which you lay out your criteria and inclinations, and then solicit feedback. As with hosting, source control, choosing a programming language, and naming the project, licensing is an issue every developer faces… yet clear, open discussions on licensing aren’t easy to find.

    Your comments on Community trotted out a big pet peeve of mine. I believe one of the major failings of many open source projects is their insistence on having a couple of forums, a few wikis, four mailing lists, five blogs, three IRC channels, two groups, six email addresses, three getting started guides, a manual, a knowledge base, a FAQ, etc. none of which cross-reference each other in any systematic way (let alone share a common archive), all with important contradictions, and all growing stale at different rates. I have no idea how anyone finds anything definitive – which only encourages people to create more shadow documentation once they find something that works.

    I strongly prefer:

    a single manual and a single community archive for all questions/comments/additional documentaton that is easily searchable with an interface that’s welcoming to new input and a built-in distribution system that allows me to receive any updates I wish to receive in the form I want to receive them.

    In practice, this seems to mean a semi- or fully-moderated forum/mailing list hybrid that’s capable of producing a syndicated feed. Something like phpBB with phpBB Mail2Forum and RSS Feed. I’m not endorsing phpBB – it’s not my favorite software project and it has its own documentation issues – but the elements are there for it to make community development really easy for a project’s developer(s) and users.

  4. DeWitt Clinton Says:

    Thanks, Brett. As always the questions and comments are spot on.

    I’m guessing that both Michael and Todd would agree with you — and suggestions like Trac and Jira could go a long way in unifying the community/documentation side of things. But more than that, it sounds like you are strongly recommending that more important than anything else is presenting to the new user/developer/whatever a single place in which they can find the answer to any question they are looking for regarding the new project.

    I couldn’t agree more. For example, as I was installing subversion, a project that actually has very good documentation, I kept getting sidetracked away from the right documentation by some extra (and obsolete) stuff that was linked to on the site. Same with the perl.apache.org site — the very good docs are mixed in with the old ones, though that site is getting a lot better.

    I suspect the reason is simple — the documentation, like the code, was written by dozens of different people. And it’s easy to add code — but in the open source world, it isn’t very acceptable to erase somebody elses work. So the files just pile up and pile up and get out of date. Maybe I’ll pledge to delete at least one thing each week from this project. Of course, at this rate I’ll be moving backward…

    Back to the questions — I am picking the 3.x series of MySQL because it has been ported to everything under the sun and is available for every distro I’ve seen. While the 4.x series (particularly 4.1) and the upcoming 5.x series are both fantastic, I believe that they are also back-compatible with 3.23. Since I’m largely trying to avoid using the database that much or tying it too much to any particular DB implementation, MySQL 3.23 seems like a good baseline database. PostgreSQL is obviously a perfectly good alternative — to Oracle : ) — but it makes more sense to go with what most people already have installed unless I need something specific in a newer version.

    Why not PHP or Ruby? For starters, I probably code best in Java and Perl, so I was no doubt biased there (even though C is usually what I’d like to be writing). And the HTTP protocol side of implementing REST right (i.e., not just as a buzzword simply because the project uses URLs), makes me think I’ll need to get in dirty with handlers inside the web server (ala moddav). So my first choice would actually have been C as an apache module if not for the strength of modperl. I have no idea about Ruby, but I do know that libphp5 isn’t trying to be what modperl is. That is to say, the apache implementation of php is designed to execute PHP (the language) efficiently like a “CGI” without forking httpd. Whereas modperl is designed to hook the Perl interpreter into the httpd daemon itself and give Perl code the same access that any C module would have. That’s powerful stuff. I’m choosing Perl over C because I don’t think performance is going to be bottlenecked by the interpreter, so I may as well leverage the high-level language features. If that doesn’t work, I’ll port it all to C.

    Oh, and I’m one of those people that doesn’t feel comfortable around semantically significant whitespace, so Python is hard for me to write even if it is a great language. For the record, I do have this one interview question that I ask candidates and I give them their choice of languages to answer it with. So I ended up coding it up myself in a number of languages (I’d share the question and the code, but I still occasionally ask it during interviews), and the two clearest solutions for me were Python and C#. I don’t know, maybe I need to write my next small bit of code in Python. I’ve known good Python programmers before (we have a few at work), and I respect them and their work. In fact, I’m not sure I’ve ever known a bad Python programmer, which may say something about the language, but I’m not sure exactly what that is.

  5. DeWitt Clinton Says:

    BTW, how do people feel about Plone?

  6. Todd Says:

    I haven’t used it, but I had to decide between plone and Confluence for the grad student federation Wiki here. I went with Confluence because I had more experience with J2EE than with Zope (both for deploying and for writing custom extensions), and because the support for the former on the University app servers was mature while the support for Zope apps was in the beta stages.

    I think Plone probably has the cleanest layout of all the wikis I’ve looked at, and it seems better suited to hosting intranet and extranet on the same site than anything else I’ve seen. If the support for python webapps here had been better I would have been really tempted to go with it.

  7. Todd Says:

    Forgot to add — one of the groups we collaborate with is using Plone for their project. See http://vgrads.rice.edu/.