ETech 2006 Trip Report: Scaling Fast and Cheap - How We Built Flickr

March 7, 2006

@ 07:56 PM

These are my notes on the Scaling Fast and Cheap - How We Built Flickr session by Cal Henderson.

This was an 8 hour tutorial session which I didn't attend. However I did get the summary of the slide deck in my swag bag. Below are summaries of the slide deck Cal presented at the tutorial.

Overview and Environments
Flickr is a photo sharing application that started off as a massively multiplayer online game called Game Never Ending (GNE). It has 2 million users and over 100 million photos. The site was acquired by Yahoo! in May 2005.

A key lesson they learned is that premature optimization is the root of all evil. Some general rules they've stuck to is

buy commodity hardware
use off-the-shelf software instead of building custom code

When trying to scale the site there were a number of factors that needed to be considered. When buying hardware these factors included availability, lead times, reliability of vendors, and shipping times. Other factors that affected purchase decisions included rack space, power usage, bandwidth, and available network ports in the data center.

Load balancing adds another decision point to the mix. One can purchase an expensive dedicated device such as a Cisco 'Director' or a Netscale device or go with a cheap software solution such as Zebra. However the software solution will still require hardware to run on. One can apply several load balancing strategies both at the layer 4 network level such as round robin, least connections and least load and at the layer 7 network layer by using URL hashes. Sites also have to investigate using GSLB, AkaDNS and LB Trees for dealing with load balancing at a large scale. Finally, there are non-Web related load balancing issues that need to be managed as well such as database or mail server load balancing.

The Flickr team made the folowing software choices

PHP 4 (specifically PHP 4.3.10)
Linux (2.4 kernel on x86_64 and 2.6 kernel on i386)
MySQL 4/4.1 with InnoDB and Character sets
Apache 2 with mod_php and prefork MPM

There were 3 rules in their software process

Use source control
Have a one step build process
Use a bug tracker

Everything goes into source control from code and documentation to configuration files and build tools.

For development platforms they chose an approach that supports rapid iteration but enforces some rigor. They suggest having a minimum of 3 platforms

Development: Working copy of the site which is currently being worked on
Staging: Almost live version of the site where changes to the live site are tested before deployment
Production: The customer facing site

Release management consists of staging the application, testing the app on the staging site then deploying the application to the production servers after successful test passes.

Everything should be tracked using the bug tracker including bugs, feature requests, support cases and ops related work items. The main metadata for the bug should be title, notes, status, owner and assigning party. Bug tracking software ranges from simple and non-free applications like FogBugz to complex, open source applications like Bugzilla.

Consistent coding standards are more valuable than choosing the right coding standards. Set standards for file names, DB table names, function names, variable names, comments, indentation, etc. Consistency is good.

Testing web applications is hard. They use unit testing for discrete/complex functions and automate as much as they can such as the public APIs. The WWW::Mechanize library has been useful in testing Flickr

Data and Protocols
Unicode is important for internationalization of a site. UTF-8 is an encoding [not a character set] which is compatible with ASCII. Making a UTF-8 web application is tricky due to inconsistent support in various layers of a web application; HTML, XML, JavaScript, PHP, MySQL, and Email all have to be made to support Unicode. For the most part this was straightforward except for PHP which needed custom functions added and filtering out characters below 0x20 from XML files [except for normalizing carriage returns]. A data integrity policy is needed as well as processes for filtering out garbage input from the various layers of the system.

Filtering bad input doesn't just refer to unicode. One also has to filter user input to prevent SQL injection and Cross Site Scripting (XSS) attacks.

The ability to receive email has been very useful to Flickr in a number of scenarios such as enabling mobile 'blogging' and support tracking. Their advice for supporting email is to leverage existing technology and not write an SMTP server from scratch. However you may need to handle parsing MIME yourself because support is weak in some platforms. For Flickr, PEAR's Mail::mimeDecode was satisfactory although deficient. You will also have to worry about uuencoded text and Transport Neutral Encapsulation Format (TNEF) which is only used by Microsoft Outlook. Finally, you may also have to special case mail sent from mobile phones due to idiosyncracies of wireless carriers.

When communicating with other services, XML is a good format to use to ensure interoperability. It is fairly simple unless namespaces are involved. The Flickr team had to hack on PEAR's XML::Parser to make it meet their needs. In situations when XML is not performant enough they use UNIX sockets.

When building services one should always assume the service call will fail. Defensive programming is key. As a consequence, one should endeavor to make service calls asynchronous since they may take a long time to process and it makes callers more redundant to failure.

Developing and Fixing
Discovering bottlenecks is an important aspect of development for web applications. Approaches include

CPU usage - rarely happens unless processing images/video, usually fixed by adding RAM
Code profiling - rarely causes problems unless doing crypto
Query profiling - usually fixed by denormalizing DB tables, adding indexes and DB caching
Disk IO - usually fixed by adding more spindles
Memory/Swap - usually fixed by adding RAM

Scaling
Scalability is about handling platform growth, dataset growth and maintainability. There are two broad approaches to scaling; Vertical scaling and Horizontal scaling. Vertical scaling is about buying a big servers, to scale one buys even bigger servers. Horizontal scaling is about buying one server and to scale one buys more of the same kind of server. In todays world, Web applications have embraced horizontal scaling. The issues facing services that adopt horizontal scaling are

increased setup/admin cost
complexity
datacenter issues - power / space / available network ports
underutilized hardware - CPU/Disks/Mem may not be used to full capacity

Services need to scale once they hit performance issues. When scaling MySQL one has to worry about

Choosing the right backend - MyISAM, BDB, InnoDB, etc
Replication
Partitioning/Clustering
Federation

One big lesson learned about database scalability is that 3^rd normal form tends to cause performance problems in large database. Denormalizing data can give huge performance wins.

The rest of the slides go on about case studies specific to Flickr which are interesting but I don't feel like summarising here. :)

Categories: Trip Report

« The Mashup Disconnect: Vendors vs. Devel... | Home | ETech 2006 Trip Report: Simple Bridge-bu... »

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for ETech 2006 Trip Report: Scaling Fast and Cheap - How We Built Flickr - Dare Obasanjo's weblog