Well, perhaps it wasn't exactly an epiphany, but something just clicked about the new project while I was driving to work the other day.
The whole project has been going well so far. I like the strict separation of the client from the server. I like that the web services approach will allow other people to reuse the backend technology in ways I haven't anticipated. I like the transparent development process and the feedback I've been hearing.
Yet somehow it all still feels a bit "web 1.0". There is still a central server that you need to trust. There may be many clients, but those clients aren't talking to each other -- they are all dependent on a single point of failure where the data resides. To be honest, I actually wanted to run that central server -- it seems like there is power in having that control.
But giving up the control is what "web 2.0" is all about. It's all about distributed services, and putting the power out there with the users. Of course there are still benefits in consolidation of resources -- there may be requirements in terms of availability and reliability that are best met by being able to host your data remotely. Central servers offer an economy of scale and the benefit of convenience. However, there is nothing inherent in this model that would prevent an arbitrarily large number of servers from serving each client.
So the new vision is this -- what if everything remained the same as before regarding users and groups -- but instead of limiting each group's membership to users on one server you instead allowed members to include users from any server. Imagine if you could make a request like:
GET http://www.unto.net/users/abigail/groups/classmates
And receive a response of:
http://www.unto.net/users/alicia http://www.unto.net/users/alison http://www.thefacebookutster.com/users/amy http://www.mytribesterspace.org/users/andrea
This means that you could then opt to share notes (or anything, really) with users on another network just by adding them to one of your local groups. From the client's perspective all of those users are just URIs, just arbitrary strings -- it doesn't matter where they are as long as they are unique identifiers and respond to the appropriate web service queries.
So I could run a public server here on unto.net, you could run a server for you and your friends on your machine, and any corporation could run one internally. This was always going to be possible, but now you can selectively open up access to people on any server, just by adding them to a group as you would any user on your own server.
(Of course, I'm not the only person to think of distributed social networks. The FOAF Vocabulary Specification already has a lot of this worked out for us. At the very least, our web service here should be able to generate and read a compatible format to allow the new project to interact with existing FOAF software. In the best case, the FOAF specification will be good enough to use verbatim. We're going with a slightly different approach because we want to use a group model rather than a pure FOAF network, but their spec may already support that.)
Here's a specific example. Say you have a user named Alice, and she has an account on server A. Alice's unique user URI is:
http://a.com/users/alice
She's friends with Bob, and Bob has an account on server B. His unique user URI is:
http://b.com/users/bob
Alice wants to share some of her notes with Bob, so she creates a new group called "g". That group's unique URI is:
http://a.com/users/alice/groups/g
If you view that group, it contains a list of members consisting of:
http://a.com/users/alice http://b.com/users/bob
Alice creates a note "n" and gives read permissions to group g on server A. If you had permission, you could read the note at:
http://a.com/users/alice/notes/n
So how can server A know that Bob, sitting at computer C, has permission to read the note? How does server A know that Bob is Bob? Server A can't ask Bob for his password because server A doesn't know even know Bob's password. Moreover, Bob has no reason to trust server A in the first place. And server A can't ask server B for Bob's password, as server B has no reason to trust server A either.
There are a few ways we can solve this, each with a varying degree of security and complexity.
In one scenario, server A says to Bob -- "okay, I know that Alice says she trusts someone named Bob on server B, prove to me that that's you." So Bob says to server A, "how about this -- I'm going to tell you two things: a secret codeword and a long string of letters. You are going to give the codeword and the string to server B and server B will tell you that I am who I say I am. Then you'll know I'm me according to server B."
The technical details of this approach: Bob generated a truly random string as the codeword, then concatenated that codeword with his password. He then uses a one way hashing function, such as SHA-1, to generate the long string of characters. When server A gives server B the codeword and the string of characters, server B, knowing Bob's password, repeats the hashing function and compares the result with what server A provided. If they match, then server B confirms for server A that Bob has an account there.
There are a few caveats, however. One, anyone intercepting the message from either Bob to server A or from server A to B could use that same message to convince any server at any time that they are Bob. You can mitigate this risk if Server B only says "yes" a single time for any given codeword, or by including a timestamp in both the public exchange and the hashed value and responding on "yes" for a limited amount of time. Or, you could use SSL to establish a shared key exchange between C and A, and between A and B, to ensure that no one can overhear.
This approach also requires that computer C be able to generate a truly random string (far, far more difficult than it sounds) and be able to make a SHA-1 hash of a string and do it efficiently. Since we know nothing about the capabilities of the client, this may be an unacceptable requirement.
Another caveat is that this technique requires servers A and B to communicate each and every time permissions are needed. This can be mitigated somewhat if server A says to Bob, "okay, now that I trust you, just use our own secret codeword to identify yourself from here on," effectively creating a local version of Bob on server A for a specified amount of time. But that still means that servers need to communicate with each other every time a user they've never seen before wants access. And is it fair to demand that the network topology allows for reliable communication between A and B at all times? Shouldn't it be enough that computer C can talk to A and can talk to B? After all, it's Bob that wants to establish the trust, so shouldn't he do the work?
As it turns out, A doesn't need to talk directly to B more than once, ever. In fact, as I'll get to in a minute, if you happen to have one central trusted server then the other servers never need to talk to each other at all. And if you are really clever (as fortunately some people already have been), you don't need to even trust a central server.
So what if instead Bob said to server A "I'm going to get you a note from server B that tells you I am who I say I am." Server A says "great, tell server B to address that note to me at this name." Bob then turns to server B and says "please write a note to server A with this name telling it that I'm Bob." Server B then hands Bob the note and says, "please deliver this note to server A, and while you're at it, tell server A that the note will include this codeword." Bob then carries that note back to server A who reads it and checks the codeword. If they match, then server A knows that server B thinks Bob is who he says he is.
The trick here is that (a) server A's note to server B can only be read by server B, (b) the note can not be modified by anyone along the way, and (c) the note is signed by server A. Fortunately, we can do all of this with the well-known public key encryption algorithms such as PGP. Server A publishes it's public key in a well known location, which server B needs to read only once. Server B gets server A's public key, for this request only, from Bob. (This is safe because the only person Bob could hurt by spoofing that key is himself.) Server B encrypts the note to server A's public key and signs it with his own private key. Bob can not read the note unless he spoofed server A's key, but the note is not saying anything he doesn't already know anyway. Server A decrypts the note with it's own private key and verifies the signature against server B's public key to ensure that it is really from server B. Lastly server A checks the codeword, which only Bob knows, to make sure that's it is really Bob giving server A the note. From there server A can establish a shared secret with Bob to save the trouble of repeating the process again anytime soon.
This has a huge advantage over the previous solution. In this scenario, computer C doesn't have to do anything other than deliver notes. The client doesn't need to know anything about generating unique keys or public and private key encryption. Server B never needs to communicate directly with server A at all. And server B just needs to get server A's public key once. In fact, as I alluded to before, PGP keyservers could be used to publish public keys for every server, so server B never needs to talk directly to server A. Even better, if the servers establish their own trust network by signing the public keys of the server's they do know, then the central keyserver does not even need to be trusted.
There is still a weak link here of course -- if client C is truly limited and can neither generate random strings nor perform a cryptographic hashing function, then there is nothing that can be done to create a secure line of communication between the client and server. In that case interception is always a real risk. Steps can be taken to minimize the impact of that risk, such as time-limited sessions, but nothing will ever be perfectly secure between client and server. But if any server, perhaps one in a corporation, decides that there always needs to be airtight security then it could require SSL for all transactions. (In the real world, almost all clients do support SSL, it is just too cumbersome and slow for thousands of little messages, but it absolutely could be done.)
However these techniques will give us something extremely valuable -- each server in a distributed network could be confident that any risk is limited to a single user account, and that the risk is limited only to the length of one compromised session. In other words, an intercepted session could only impact the user that was intercepted, and even that user would be protected from long term impact. Of course, certain sensitive operations, such as changing passwords, do need to be performed over secure channels, but this allows the day-to-day communications to be done enencrypted with manageable risk. And imporantly it makes no assumptions about the capabilities of the client, which is essential for the web service to be a success. And as far as I can tell this is no worse -- and honestly, it is likely to be significantly better -- than any of the other web services in existence today.
I realize that I'm basically just covering Applied Cryptography 101 here. I don't mean to be reinventing the wheel, I'm simply doing it because (a) I never studied Applied Cryptography 101, so I am pretty much just making it up as I go along and don't know any better, (b) I think that it is essential that this process is transparent, and that you all know what I'm doing so you can trust me, (c) I think that it's good to lay all of this out in simple language so everyone can understand, rather that just citing jargon that may or may not be appropriate, and (d) I hope to be corrected by you when I make a mistake. Thanks!
[Also, I'd like to thank someone for spending an hour with me this week discussing this exact problem. I'm not sure if I'm at liberty to share your name here, but please feel free to comment below -- your insights were very valuable and I'd love to discuss it further.]