Yesterday Amazon's S3 service had an outage that lasted about six hours. Unsurprisingly this has led to a bunch of wailing and gnashing of teeth from the very same pundits that were hyping the service a year ago. The first person to proclaim the sky is falling is Richard MacManus in his More Amazon S3 Downtime: How Much is Too Much? who writes

Today's big news is that Amazon's S3 online storage service has experienced significant downtime. Allen Stern, who hosts his blog's images on S3, reported that the downtime lasted 3.5 over 6 hours. Startups that use S3 for their storage, such as SmugMug, have also reported problems. Back in February this same thing happened. At the time RWW feature writer Alex Iskold defended Amazon, in a must-read analysis entitled Reaching for the Sky Through The Compute Clouds. But it does make us ask questions such as: why can't we get 99% uptime? Or: isn't this what an SLA is for?

Om Malik joins in on the fun with his post S3 Outage Highlights Fragility of Web Services which contains the following

Amazon’s S3 cloud storage service went offline this morning for an extended period of time — the second big outage at the service this year. In February, Amazon suffered a major outage that knocked many of its customers offline.

It was no different this time around. I first learned about today’s outage when avatars and photos (stored on S3) used by Twinkle, a Twitter-client for iPhone, vanished.

That said, the outage shows that cloud computing still has a long road ahead when it comes to reliability. NASDAQ, Activision, Business Objects and Hasbro are some of the large companies using Amazon’s S3 Web Services. But even as cloud computing starts to gain traction with companies like these and most of our business and communication activities are shifting online, web services are still fragile, in part because we are still using technologies built for a much less strenuous web.

Even though the pundits are trying to raise a stink, the people who should be most concerned about this are Amazon S3's customers. Counter to Richard MacManus's claim, not only is there a Service Level Agreement (SLA) for Amazon S3, it promises 99.9% uptime or you get a partial refund. 6 hours of downtime sounds like a lot until you realize that 99% uptime is 8 hours of downtime a month and over three and a half days of downtime a year. Amazon S3 is definitely doing a lot better than that.

The only question that matters is whether Amazon's customers can get better service elsewhere at the prices Amazon charges. If they can't, then this is an acceptable loss which is already covered by their SLA. 99.9% uptime still means over eight hours of downtime a year. And if they can, it will put competitive pressure on Amazon to do a better job of managing their network or lower their prices.

This is one place where market forces will rectify things or we will reach a healthy equilibrium. Network computing is inherently and no amount of outraged posts by pundits will ever change that. Amazon is doing a better job than most of its customers can do on their own for cheaper than they could ever do on their own. Let's not forget that in the rush to gloat about Amazon's down time.

Now Playing: 2Pac - Life Goes On


 

Monday, 21 July 2008 14:59:13 (GMT Daylight Time, UTC+01:00)
Refunds mean nothing to med-sized companies that decommissioned licensed ERP solutions for a SAAS experiment. S3 often hides behind another actor.
Monday, 21 July 2008 19:20:35 (GMT Daylight Time, UTC+01:00)
A little confused about something. The SLA claims 99.9% uptime but you calculate the threshold at 99% uptime. By my calculations, 99.9% uptime per year works out to just shy of nine hours per year or 45 minutes/month. Were you calculating based on the 25% refund threshold?
Tuesday, 22 July 2008 00:00:23 (GMT Daylight Time, UTC+01:00)
Amen. I had a job where a server went down for 2 hours and my manager came to me complaining that I was supposed to guarantee 95% uptime due to my contract and to get it online stat (like I wasn't already working on it). I didn't think at the time that pointing out it could be offline another 17 days was going to help my case though.

It's a pretty ridiculous thing to offer any % figure - it'd be more preferable if it also had a guarantee of never being unaccessible for more than 30 mins in any given downtime scenario, etc.

But again, any provider would have issues with never having downtime - you'd wonder how much it would cost to provide a backup to their service at any given time.
Andrew Tobin
Tuesday, 22 July 2008 04:17:20 (GMT Daylight Time, UTC+01:00)
99.999% != 8 hours of downtime a month .... fuzzy math there .... naija boy
anonymous
Tuesday, 22 July 2008 05:10:42 (GMT Daylight Time, UTC+01:00)
Kyle,
99% uptime comes from the statement by Richard MacManus on ReadWriteWeb which is excerpted above.

Anonymous,
I'm confused, where do you see 99.999% uptime anywhere in my post?
Tuesday, 22 July 2008 20:05:40 (GMT Daylight Time, UTC+01:00)
My mistake. I see the issue now. But still, according to the SLA, customers should be able to get at least a 10% refund for this, yesno?
Wednesday, 23 July 2008 10:20:35 (GMT Daylight Time, UTC+01:00)
i hope our server never running down...
Wednesday, 23 July 2008 16:13:16 (GMT Daylight Time, UTC+01:00)
I agree about the overdone proclamations of doom and said as much here:

http://news.cnet.com/8301-1001_3-9995937-92.html

For one thing, the reliability of cloud computing services must be compared not just to 0% downtime but also to the downtime of running your own IT.
Comments are closed.