I recently got an email from a developer about my post Thoughts on Amazon's Internal Storage System (Dynamo) which claimed that I seem to be romanticizing distributed systems that favor availability over consistency. He pointed out that although this sounds nice on paper, it places a significant burden on application developers. He is 100% right. This has been my experience in Windows Live and I’ve heard enough second hand to assume it is the experience at Amazon as well when it comes to Dynamo.
I thought an example of how this trade-off affects developers would be a useful excercise and may be of interest to my readers. The following example is hypothetical and should not be construed as describing the internal architectures of any production systems I am aware of.
Scenario: Torsten Rendelmann, a Facebook user in Germany, accepts a friend request from Dare Obasanjo who is a Facebook user in the United States.
The Distributed System: To improve the response times for users in Europe, imagine Facebook has a data center in London while American users a serviced from a Data Center in Palo Alto. To achieve this, the user database is broken up in a process commonly described as sharding. The question of if and how data is replicated across both data centers isn’t relevant to this example.
The application developer who owns the confirm_friend_request() method, will ideally want to write code that took the following form
public void confirm_friend_request(user1, user2){ begin_transaction(); update_friend_list(user1, user2, status.confirmed); //palo alto update_friend_list(user2, user1, status.confirmed); //london end_transaction(); }
public void confirm_friend_request(user1, user2){
begin_transaction(); update_friend_list(user1, user2, status.confirmed); //palo alto update_friend_list(user2, user1, status.confirmed); //london end_transaction();
}
Yay, distributed transactions. You have to love a feature that every vendor advises you not to use if you care about performance. So obviously this doesn’t work for a large scale distributed system where performance and availabilty are important.
Things get particularly ugly when you realize that either data center or the specific server a user’s data is stored on could be unreachable for a variety of reasons (e.g. DoS attack, high seasonal load, drunk sys admin tripped over a power cord, hard drive failure due to cheap components, etc).
There are a number of options one can consider when availability and high performance are considered to be more important than data consistency in the above example. Below are three potential implementations of the code above each with it’s own set of trade offs.
OPTION I: Application developer performs manual rollback on error
public void confirm_friend_request_A(user1, user2){ try{ update_friend_list(user1, user2, status.confirmed); //palo alto }catch(exception e){ report_error(e); return; } try{ update_friend_list(user2, user1, status.confirmed); //london }catch(exception e) { revert_friend_list(user1, user2); report_error(e); return; }}
public void confirm_friend_request_A(user1, user2){
try{ update_friend_list(user1, user2, status.confirmed); //palo alto }catch(exception e){ report_error(e); return; }
try{ update_friend_list(user2, user1, status.confirmed); //london }catch(exception e) { revert_friend_list(user1, user2); report_error(e); return; }}
The problem here is that we don’t handle the case where revert_friend_list() fails. This means that Dare (user1) may end up having Torsten (user2) on his friend list but Torsten won’t have Dare on his friend list. The database has lied.
OPTION II: Failed events are placed in a message queue to be retried until they succeed.
public void confirm_friend_request_B(user1, user2){ try{ update_friend_list(user1, user2, status.confirmed); //palo alto }catch(exception e){ report_error(e); add_to_retry_queue(operation.updatefriendlist, user1, user2, current_time()); } try{ update_friend_list(user2, user1, status.confirmed); //london }catch(exception e) { report_error(e); add_to_retry_queue(operation.updatefriendlist, user2, user1, current_time()); }}
public void confirm_friend_request_B(user1, user2){
try{ update_friend_list(user1, user2, status.confirmed); //palo alto }catch(exception e){ report_error(e); add_to_retry_queue(operation.updatefriendlist, user1, user2, current_time()); }
try{ update_friend_list(user2, user1, status.confirmed); //london }catch(exception e) { report_error(e); add_to_retry_queue(operation.updatefriendlist, user2, user1, current_time()); }}
Depending on how long the error exists and how long it takes an item to sit in the message queue, there will be times when the Dare (user1) may end up having Torsten (user2) on his friend list but Torsten won’t have Dare on his friend list. The database has lied, again.
OPTION III: System always accepts updates but application developers may have to resolve data conflicts later. (The Dynamo approach)
/* update_friend_list always succeeds but may enqueue an item in message queue to try again later in the event of failure. This failure is not propagated to callers of the method. */ public void confirm_friend_request_C(user1, user2){ update_friend_list(user1, user2, status.confirmed); // palo alto update_friend_list(user2, user1, status.confirmed); //london } /* get_friends() method has to reconcile results returned by get_friends() because there may be data inconsistency due to a conflict because a change that was applied from the message queue is contradictory to a subsequent change by the user. In this case, status is a bitflag where all conflicts are merged and it is up to app developer to figure out what to do. */ public list get_friends(user1){ list actual_friends = new list(); list friends = get_friends(); foreach (friend in friends){ if(friend.status == friendstatus.confirmed){ //no conflict actual_friends.add(friend); }else if((friend.status &= friendstatus.confirmed) and !(friend.status &= friendstatus.deleted)){ // assume friend is confirmed as long as it wasn’t also deleted friend.status = friendstatus.confirmed; actual_friends.add(friend); update_friends_list(user1, friend, status.confirmed); }else{ //assume deleted if there is a conflict with a delete update_friends_list( user1, friend, status.deleted) } }//foreach return actual_friends;}
/* update_friend_list always succeeds but may enqueue an item in message queue to try again later in the event of failure. This failure is not propagated to callers of the method. */
public void confirm_friend_request_C(user1, user2){ update_friend_list(user1, user2, status.confirmed); // palo alto update_friend_list(user2, user1, status.confirmed); //london
/* get_friends() method has to reconcile results returned by get_friends() because there may be data inconsistency due to a conflict because a change that was applied from the message queue is contradictory to a subsequent change by the user. In this case, status is a bitflag where all conflicts are merged and it is up to app developer to figure out what to do. */
public list get_friends(user1){ list actual_friends = new list(); list friends = get_friends();
foreach (friend in friends){
if(friend.status == friendstatus.confirmed){ //no conflict actual_friends.add(friend);
}else if((friend.status &= friendstatus.confirmed) and !(friend.status &= friendstatus.deleted)){
// assume friend is confirmed as long as it wasn’t also deleted friend.status = friendstatus.confirmed; actual_friends.add(friend); update_friends_list(user1, friend, status.confirmed);
}else{ //assume deleted if there is a conflict with a delete update_friends_list( user1, friend, status.deleted) }
}//foreach
return actual_friends;}
These are just a few of the many approaches that can be implemented in such a distributed system to get around the performance and availability implications of using distributed transactions. The main problem with them is that in every single case, the application developer has an extra burden placed on his shoulders due to inherent fragility of distributed systems. For a lot of developers, the shock of this realization is akin to the shock of programming in C++ after using C# or Ruby for a couple of years. Manual memory management? Actually having to perform bounds checking arrays? Being unable to use decades old technology like database transactions?
The challenge in building a distributed storage system like BigTable or Dynamo is in balancing the need for high availability and performance while not building a system that encourages all sorts of insidious bugs to exist in the system by design. Some might argue that Dynamo goes to far in the burden that it places on developers while there are others that would argue that it doesn’t go far enough.
In what camp do you fall?
Now playing: R. Kelly - Rock Star (feat. Ludacris & Kid Rock)