Disclaimer: What follows are my personal impressions from using the beta version of Windows Azure. It is not meant to be an official description of the project from Microsoft, you can find that here.
Earlier this week I scored an invite to try out the beta version of Windows Azure which is a new hosted services (aka cloud computing) platform from Microsoft. Since there's been a ridiculous amount of press about the project I was interested in actually trying it out by developing and deploying some code using this platform and sharing my experiences with others.
What is it?
Before talking about a cloud computing platform, it is useful to agree on definitions of the term cloud computing. Tim O'Reilly has an excellent post entitled Web 2.0 and Cloud Computing where he breaks the technologies typically described as cloud computing into three broad categories
- Utility Computing: In this approach, a vendor provides access to virtual server instances where each instance runs a traditional server operating system such as Linux or Windows Server. Computation and storage resources are metered and the customer can "scale infinitely" by simply creating new server instances. The most popular example of this approach is Amazon EC2.
- Platform as a Service: In this approach, a vendor abstracts away the notion of accessing traditional LAMP or WISC stacks from their customers and instead provides an environment for running programs written using a particular platform. In addition, data storage is provided via a custom storage layer and API instead of traditional relational database access. The most popular example of this approach is Google App Engine.
- Cloud-based end user applications: This typically refers to Web-based applications that have previously been provided as desktop or server based applications. Examples include Google Docs, Salesforce and Hotmail. Technically every Web application falls under this category, however the term often isn't used that inclusively.
With these definitions clearly stated it is easier to talk about what Windows Azure is and is not. Windows Azure is currently #2; a Platform as a Service offering. Although there have been numerous references to Amazon's offerings both by Microsoft and bloggers covering the Azure announcements, Windows Azure is not a utility computing offering [as defined above].
There has definitely been some confusion about this as evidenced by Dave Winer's post Microsoft's cloud strategy? and commentary from other sources.
Getting Started
To try out Azure you need to be running Windows Server 2008 or Windows Vista with a bunch of prerequisites you can get from running the Microsoft Web Platform installer. Once you have the various prerequisites installed (SQL Server, IIS 7, .NET Framework 3.5, etc) you should then grab the Windows Azure SDK. Users of Visual Studio will also benefit from grabbing the Windows Azure Tools for Visual Studio.
After this process, you should be able to fire up Visual Studio and see the option to create a Cloud Service if you go to File->New->Project.
Building Cloud-based Applications with Azure
The diagram below taken from the Windows Azure SDK shows the key participants in a typical Windows Azure service
The work units that make up a Windows Azure hosted service can have one of two roles. A Web role is an application that listens for and responds to Web requests while a Worker role is a background processing task which acts autonomously but cannot be accessed over the Web. A Windows Azure application can have multiple instances of Web and Worker roles that make up the service. For example, if I was developing a Web-based RSS reader I would need a worker role for polling feeds and Web role for displaying the UI that the user interacts with. Both Web and Worker roles are .NET applications that can be developed locally and then deployed on Microsoft's servers when they are ready to go.
Azure applications have access to a storage layer that provides the following three storage services
-
Blob Storage: This is used for storing binary data. A user account can have one or more containers which in turn can contain one or more blobs of binary data. Containers cannot be nested so one cannot create hierarchical folder structures. However Azure allows applications to work around this by (i) allowing applications to query containers based on substring matching on prefixes and (ii) delimiters such as '\' and other path characters are valid blob names. So I can create blobs with names like 'mypics\wife.jpg' and 'mypics\son.jpg' in the media container and then query for blobs beginning with 'mypics\' thus simulating a folder hierarchy somewhat.
-
Queue Service: This is a straightforward message queuing service. A user account can have one or more queues from which they can add items to the end of each queue and remove items from the front. Items have a maximum time-to-live of 7 days within the queue. When an item is retrieved from the queue, an associated 'pop receipt' is provided. The item is then hidden from other client applications until some interval (by default 30 seconds) has passed after which the item becomes visible. The item can be deleted from the queue during that interval if the pop receipt from when it was retrieved is provided as part of the DELETE operation. The queue service is valuable as a way for Web roles to talk to Worker roles and vice versa.
-
Table Storage: This exposes a subset of the capabilities of the ADO.NET Data Services Framework (aka Astoria). In general, this is a schema-less table based model similar to Google's BigTable and Amazon's SimpleDB. The data model consists of tables and entities (aka rows). Each entity has a primary key made of two parts {PartitionKey, RowKey}, a last modified timestamp and an arbitrary number of user-defined properties. Properties can be one of several primitive types including integer, strings, doubles, long integers, GUIDs, booleans and binary. Like Astoria, the Table Storage service supports performing LINQ queries on rows but only supports the FROM, WHERE and TAKE operators. Other differences from Astoria are that it doesn't support batch operations nor is it possible to retrieve individual properties from an entity without retrieving the entire entity.
These storage services are accessible to any HTTP client and not just Azure applications.
Deploying Cloud-based Applications with Azure
The following diagram taken from the Windows Azure SDK shows the development lifecycle of an Windows Azure application
The SDK ships with a development fabric which enables you to deploy an Azure an application locally via IIS 7.0 and development storage which uses SQL Server Express as a storage layer which mimics the Windows Azure storage services.
As the diagram shows above, once the application is tested locally it can be deployed entirely or in part on Microsoft's storage and cloud computation services.
The Azure Services Platform: Windows Azure + Microsoft's Family of REST Web Services
In addition to Windows Azure, Microsoft also announced the Azure Services Platform which is a variety of Web APIs and Web Services that can be used in combination with Windows Azure (or by themselves) to build cloud-based applications. Each of these Web services is worthy of its own post (or whitepaper and O'Reilly animal book) but I'll limit myself to one sentence descriptions for now.
-
Live Services: A set of REST APIs for consumer-centric data types (e.g. calendar, profile, etc) and scenarios (communication, presence, sync, etc). You can see the set of APIs in the Live Framework poster and keep up with the goings on by following the Live Services blogs.
-
Microsoft SQL Services: Relational database in the cloud accessible via REST APIs. You can learn more from the SSDS developer center and keep up with the goings on by following the SQL Server Data Services team blog.
-
Microsoft .NET Services: Three fairly different services for now; hosted access control, hosted workflow engine and a service bus in the cloud. Boring enterprise stuff. :)
-
Microsoft Sharepoint Services: I couldn't figure out if anything concrete was announced here or whether stuff was pre-announced (i.e. actual announcement to come at a later date).
-
Microsoft Dynamics CRM Services: Ditto.
From the above list, I find the Live Services piece (access to user data in a uniform way) and the SQL Services (hosted storage) most interesting. I will likely revisit them in more depth at a later date.
The Bottom Line
From my perspective, Windows Azure is easiest viewed as a competitor to Google App Engine. As comparisons go, Azure already brings a number of features to the table that aren't even on the Google App Engine road map. The key important feature is the ability to run background tasks instead of just being limited to writing applications that respond to Web requests. This limitation of App Engine means you can't write any application that does any serious background computation like a search engine, email service, or RSS reader on Google App Engine. So Azure can run an entire class of applications that are simply not possible on Google App Engine.
The second key feature is that by supporting the .NET Framework, developers theoretically get a plethora of languages to choose from including Ruby (IronRuby), Python (IronPython), F#, VB.NET and C#. In practice, the Azure SDK only supports creating cloud applications using C# and VB.NET out of the box. However I can't think of any reason why it shouldn't be able to support development with other .NET enabled languages like IronPython. On the flipside, App Engine only supports Python and the timeline for it supporting other languages [and exactly which other languages] is still To Be Determined.
Finally, App Engine has a number of scalability limitations both from a data storage and a query performance perspective. Azure definitely does better than App Engine on a number of these axes. For example, App Engine has a 1MB limit per file while Azure has a 64MB limit on individual blobs and also allows you to split a blob into blocks of 4MB each. Similarly, I've been watching SQL Server Data Services (SSDS) for a while and I haven't seen or heard complaints about query performance.
Azure makes it possible for me to reuse my existing skills as a .NET developer who is savvy with using RESTful APIs to build cloud based applications without having to worry about scalability concerns (e.g. database sharding, replication strategies, server failover, etc). In addition, it puts pressure on competitors to step up to the plate and deliver. However you look at it, this is a massive WIN for Web developers.
The two small things I'd love to see addressed are first class support for IronPython and some clarity on the difference between SSDS and Windows Azure Storage services. Hopefully we can avoid a LINQ to Entities vs. LINQ to SQL-style situation in the future.
Postscript: Food for Thought
It would be interesting to read [or write] further thoughts on the pros and cons of Platform as a Service offerings when compared to Utility Computing offerings. In a previous discussion on my blog there was some consensus that utility computing approaches are more resistant to vendor lock-in than platform as a service approaches since it is easier to find multiple vendors who are providing virtual servers with LAMP/WISC hosting than it will be to find multiple vendors providing the exact same proprietary cloud APIs as Google, Amazon or Microsoft. However it would be informative to look at the topic from more angles, for instance what is the cost/benefit tradeoff of using SimpleDB/BigTable/SSDS for data access instead of MySQL running on multiple virtual hosts? With my paternity leave ending today, I doubt I'll have time to go over these topics in depth but I'd appreciate reading any such analysis.
Now Playing: The Game - Money