In my previous post, I mentioned that I'm in the early stages of building an
application on the Facebook
platform. I haven't yet decided on an application but for now, let's
assume that it is a Favorite Comic Books application which allows me
to store my favorite comic books and shows me to most popular comic
books among my friends.
After investigating using Amazon's
EC2 +
S3 to build my application I've decided
that I'm better off using a traditional hosting solution running either a on
the LAMP or
WISC platform. One of the
things I've been looking at is which platform has better support for providing
an in-memory caching solution that works well in the context of a Web farm
(i.e. multiple Web servers) out of the box. While working on the platforms
behind several high traffic Windows Live services I've learned that you
should be prepared for dealing with scalability issues and
caching is one of the best ways to get bang for the buck when improving
the scalability of your service.
I recently discovered memcached
which is a distributed, object caching system originally developed by
Brad Fitzpatrick of
LiveJournal fame. You can think of
memcached as a giant hash table
that can run on multiple servers which automatically handles maintaining the
balance of objects hashed to each server and transparently fetches/removes
objects from over the network if they aren't on the same machine that is
accessing an object in the hash table. Although this sounds fairly simple,
there is a lot of grunt work in building a distributed object cache which
handles data partitioning across multiple servers and hides the distributed
nature of the application from the developer. memcached is a well integrated
into the typical LAMP
stack and is used by a surprising number of high traffic websites including
Slashdot,
Facebook,
Digg,
Flickr
and Wikipedia. Below is what C# code that utilizes memcached would look like sans exception handling code
public ArrayList GetFriends(int user_id){
ArrayList friends = (ArrayList) myCache.Get("friendslist:" + userid);
if(friends == null){
// Open the connection
dbConnection.Open();
SqlCommand cmd = new SqlCommand("select friend_id from friends_list where owner_id=" + "user_id", dbConnection);
SqlDataReader reader = cmd.ExecuteReader();
// Add each friend ID to the list
while (reader.Read()){
friends.Add(rdr[0]);
}
reader.Close();
dbConnection.Close();
myCache.Set("friendslist:" + userid, friends);
}
return friends;
}
public void AddFriend(int user_id, int new_friends_id){
// Open the connection
dbConnection.Open();
SqlCommand cmd = new SqlCommand("insert into friends_list (owner_id, friend_id) values (" + user_id + "," + new_friend_id ")";
cmd.ExecuteNonQuery();
//remove key from cache since friends list has been updated
myCache.Delete("friendslist:" + userid);
dbConnection .Close();
}
The benefits of the using of the cache should be pretty obvious. I no longer need to hit the database after the first request to retrieve the user's friend list which means faster performance in servicing the request and less I/O. The memcached automatically handles purging items out of the cache when it hits the size limit and also deciding which cache servers should hold individual key<->value pairs.
I hang with a number of Web developers on the
WISC platform and I don't think
I've ever heard anyone mention
memcached or anything like it.In fact I couldn't find a mention of it on Microsoft employee blogs, ASP.NET developer blogs or on
MSDN. So I wondered what the average
WISC developer uses as their
in-memory caching solution.
After looking around a bit, I came to the conclusion that most
WISC developers use the
built-in
ASP.NET
caching features. ASP.NET provides a number of in-memory caching
features including a Cache class which provides a similar API to
memcached, page directives for caching portions of the page or the entire page and the ability to create
dependencies between cached objects and the files or database tables/rows
that they were populated from via the
CacheDependency and SqlCacheDependency classes.
Although some of these features are also available in various Open Source
web development frameworks such as Ruby on Rails + memcached, none give as much functionality
out of the box as ASP.NET or so it seems.
Below is what the code for the GetFriends
and
AddFriend
methods would look like using the built-in ASP.NET
caching features
public ArrayList GetFriends(int user_id){
ArrayList friends = (ArrayList) Cache.Get("friendslist:" + userid);
if(friends == null){
// Open the connection
dbConnection.Open();
SqlCommand cmd = new SqlCommand("select friend_id from friends_list where owner_id=" + "user_id", dbConnection);
SqlCacheDependency dependency = new SqlCacheDependency(cmd);
SqlDataReader reader = cmd.ExecuteReader();
// Add each friend ID to the list
while (reader.Read()){
friends.Add(rdr[0]);
}
reader.Close();
dbConnection.Close();
//insert friends list into cache with associated dependency
Cache.Insert("friendslist:" + userid, friends, dependency);
}
return friends;
}
public void AddFriend(int user_id, int new_friends_id){
// Open the connection
dbConnection.Open();
SqlCommand cmd = new SqlCommand("insert into friends_list (owner_id, friend_id) values (" + user_id + "," + new_friend_id ")";
cmd.ExecuteNonQuery();
/* no need to remove from cache because SqlCacheDependency takes care of that automatically */
// Cache.Remove("friendslist:" + userid);
dbConnection .Close();
}
Using the SqlCacheDependency class gets around a
significant limitation of the ASP.NET Cache class. Specifically, the cache is
not distributed. This means that if you have multiple Web front ends,
you'd have to write your own code to handle partitioning data and
invalidating caches across your various Web server instances. In fact,
there are numerous articles showing how to implement such a solution including
Synchronizing the ASP.NET Cache across AppDomains and Web Farms by Peter Bromberg and
Use Data Caching Techniques to Boost Performance and Ensure Synchronization by David Burgett.
However, let's consider how how SqlCacheDependency is implemented. If you are
using SQL Server 7 or SQL Server 2000, then your ASP.NET process polls the
database at regular intervals to determine whether the target(s) of the
original query have changed. For SQL Server 2005, the database can be
configured to send change notifications to the Web servers if the target(s)
of the original query change. Either way, the database is doing work to
determine if the data has changed. Compared to the memcached this still doesn't seem as efficient as we can get
if we want to eke out every last out of performance out of the system although
it does lead to simpler code.
If you are a developer on the WISC platform and are concerned about getting the best performance out of your
Web site, you should take a look at memcached for Win32. The most highly trafficked site on the
WISC platform is probably MySpace and in articles about how they are platform works such as Inside MySpace.com they extol the virtues of moving work out of the
database and relying on cache servers.