A New Project, Part 8


sticky, part 8



Based on some questions that were raised, I realized that there is still some confusion regarding the user authentication model. This is good, as it got me to thinking more about the problem, and I want to reopen it here in case anyone has a better suggestion.

Starting from the top, one thing that almost every API call has in common is a user. This user is not necessarily the same person that is making the API call (though it could be), but it is always the subject of whatever action is being taken. I.e., the user in the URL is always the person we're doing something with. Or more succinctly, the user is the subject of the API call.

For example, if we want to get Emily's (the most popular female baby name in the US 2004 1) information we call:

GET /users/emily


If we want Emily's friends, we call:

GET /users/emily/friends


And so on. And if we want to have Emily send an invitation to Jacob (the top male name), we call:

POST /users/emily/invite/jacob


(Remember, we use POST here instead of GET because there will be a side-effect.) So again, we have the familiar root of /users/emily for each of these actions that apply to Emily. It's her information, her friends, her invitation. The API calls are consistent and predictable, which will help new developers of the system orient themselves easily. 2

This is great -- but it doesn't answer an important question: who is getting Emily's information or telling Emily to send an invitation to Jacob? Is it Emily? Well, sometimes it is. But sometimes other people will want to get information about Emily as well. Should those other people be required to call a different API? No, the consistency is what makes this a good model.

So we clearly need to pass in more information about who is making the call. We can refer to this additional information as the authentication model. (We could call it a lot of things, such as "credentials" or "login data" but I'll pick "authentication" just so we have a common term.) The authentication model may take a number of different forms, but no matter what form it takes, it is always orthogonal to the specific API call itself. That is to say, nearly all API calls can use an authentication model and that model is consistent across all calls, no matter what they are.

In the simplest form, the authentication model could just be a username (or user ID). If you are Emma (the second most popular name), you could just tell us you are Emma and we could simply trust you. If Emma has permission to view Emily's information, then she can call GET /users/emily and somehow pass along an authentication of "Emma" and we'll return the data. Of course, that's not particularly secure, so we'd probably want some way of verifying that Emma is who she says she is. So we also ask her to return some sort of information that only Emma knows -- such as Emma's password.

As we covered in part 3, we have a few different ways of passing in that authentication data. We could rely on the HTTP standard (HTTP AUTH). We could use cookies. We could use additional parameters in the URL. Each has advantages and disadvantages, but one thing they will all need to have in common is that the client is responsible for supplying the server with the authentication information each request.

But there is something inherently ugly about that. For one, if we're using HTTP, rather than HTTPS, then the "thing that only Emma knows", her password, is flying around the Internet in clear text every request. Moreover, if we use URL parameters then the URLs themselves are not easy to bookmark or share-- someone needs to remember to go and strip out the username and password from each URL before passing it along. (Granted, this is a backend web-service API, so these aren't usually going to be the links that are bookmarked, but you don't want to design yourself into a corner from the beginning.)

So while an explicit username and password in the request is useful in certain scenarios, what we really want is some other more secret way of identifying the caller. The problem is, if it can be passed over the network in clear text, then it's not much of a secret. It doesn't matter if it's a cookie or part of the URL -- if it's crosses the wire in plain text, then it shouldn't be fully trusted. So what options remain?

Most good websites today use some variation of a two token authentication model (my write up of it, circa April 2000). This model uses one secure token that is only passed over secure channels alongside a less secure token that can be passed anywhere. The less secure token identifies the user for the purposes of restoring state (such as a shopping cart), but is insufficient on its own to make priviledged calls (such as making a purchase, seeing sensitive customer data, or modifying the account).

This works well for e-commerce sites that can incur the overhead of SSL at checkout time, but it doesn't make a whole lot of sense for a web service API where there is no state to restore -- hence obviating the need for a two token model.

Web services such as AWS, Flickr, and del.icio.us all use relatively insecure clear-text auth models. AWS passes around API keys and shopping cart tokens in the open (though of course not usernames or passwords), Flickr uses URL parameters containing the username and password, and del.icio.us uses HTTP AUTH. Each of those models is secure enough for each application -- but they all fall victim to replay attacks, whereby someone with malicious intent can intercept communications and reuse the original credentials to perform arbitrary actions.

My initial thought was to improve on the simple models on slightly. For example, I could have a login API that issued a transient token that could be used only until it timed out or was invalidated, and that token would be submitted by the client on each request. While the token would still be susceptible to interception, the damage would be slightly mitigated due to the short lifespan of the token. However, this isn't real security, and it requires a central authenication server or database to manage each session.

I also considered a strategy to alleviate the need for a central server to store the session tokens. You could conceivably do this by including the timeout information in the token and cryptographically signing it before handing it to the client. While this would allow you to trust the client enough to avoid having to make an expensive call to check whether or not the session has expired, you would be unable to allow any arbitrary client to invalidate a particular token across the whole system. I.e., you could timeout, but not log out, at least not system-wide. So while this helps you scale laterally, it comes at a cost. 3

My second thought was to use a Web Security Service Username Token Profile, similar to what Atom (perhaps?) does. WSS has the advantage in that it is not vulnerable to replay attacks, insofar as the client passes along a unique identifier and timestamp with each request that are both checked on the server side to prevent reuse of the token. This a secure model, but it requires a lot of work on the client side. The client needs to be able to generate a unique string (harder than it sounds), make a SHA-1 or similar digest, and perform a Base64 encoding. That's all trivial work for something running on the server, but this is client-side code potentially running on JavaScript or similar.

As it stands I haven't found a perfect solution. And based on what I've seen on other sites, no one else really has either. What I will probably do is make it such that you can pass in the username and password on any request and that will always work. However, there will be also be an API to request a secure token (that times out), which can be sent in lieu of a password. Both options will be available via URL parameters and where possible, cookies and HTTP AUTH. Some API calls will be possible over SSL for additional security. I will keep the model as open-ended as possible and give the client as many choices are reasonable. (I'm still trying to figure out what Gmail does -- the Google cookie has an SID that is set when you log in, but it doesn't change between requests, so it's not WSS-based.)

While this is obviously a hard topic, it seems to be a critical one if these web services are going to possibly form the backbone of all sorts of different applications. And as the best web service APIs require little from their clients, I want to provide options ranging from the easy to implement and admittedly less secure, to the more difficult but highly secure.

As with everywhere -- perhaps even more so here -- I really value your insight and feedback. Thanks!

1: As found on the Social Security Adminstration's baby name page. On the assumption that this project will be completed roughly around the same time that young Emily and Jacob are old enough to use it.

2: This consistency in the URL of each API call actually has another even more important advantage -- partitioning. Scaling a system like this to millions of users will be impossible unless we can find some way to split the data into manageable sizes. Ideally, we'll be able to partition that data into arbitrarily many pieces and be able to determine which partition a request falls in before any hard work is done. With each API call beginning with /users/username we can simply use a URL-aware load balancer to distribute the request to the right host or cluster of hosts. This will come up again, but it's important to keep it in the back of your mind from the beginning, otherwise it can be extremely difficult to add later.

3: It's even more complicated than that. For although the root of each URL is easy to partition on, recall that the authentication model is orthogonal to the URL. So while one server may handle all users starting with the letter "A", there is no guarantee that the individual whose credentials we are using will end up in the same partition. Perhaps the authentication can be done and distributed at a layer above the API call itself, but that's a little beyond the scope of this project at the moment.