I just spent an hour doing some research in response to Sam Ruby's post Sousveillance where he wonders whether some of the descriptions of Facebook as a social graph roach motel (i.e. information about your relationships goes in, nothing comes out) is accurate. Sam writes

Dare seems to think that the root problem is oppression by the “man”.  In this case, a 23 year old.  Brad seems to view this as a technical problem.

I wonder what I wrote that gave that impression especially in the linked post. In that post, I was simply giving some advise about the kind of social problems you will face when treating unifying social graphs across different contexts and applications as a technical problem. If anyone is whining about oppression by Facebook, it would be Brad’s original manifesto which mentions the site by name over a dozen times.

Data point 1: one day when logging onto Facebook, I saw an offer to scan my AIM contacts and invite the ones that had Facebooks to be friends.  I unselected a few, and then clicked on submit.  Within hours, my network expanded greatly.  IM ids serve as useful foreign keys.

Like lots of popular social networking services, but not Windows Live Spaces, Facebook is fond of violating the terms of use of various email providers by screen scraping user address books and contact lists after collecting their log-in credentials.

However Facebook prevents this from being done to them by only showing email addresses as images which expire after a couple of minutes due to use of session keys. I once considered writing an application to import my Facebook contacts into Outlook but gave up once I realized I couldn’t find any free, off-the-shelf OCR APIs that I could use.

I did find an article on CodeProject about rolling your own OCR via neural networks which seems promising but I don't have the free time to mess with that right now. Maybe later in the year. Sam also writes

Data point 2: Facebook is a platform with an API.  If there is a need, it seems to me that one could develop an application using FQL to pull one’s friend list out of Facebook and share it externally.  The fact that I don’t know of such an application means one of four things is happening: (1) it exists, but I don’t know about it, (2) despite the alleged overwhelming demand for this feature, and obvious commercial opportunities it opens up, it hadn’t occurred to anyone, (3) I’m reading the documentation wrong, and it isn’t possible for applications to obtain access to one’s own Facebook ID for use as a foreign key, or (4) the demand simply isn’t there.

Or (5) the information returned by FQL about a user contains no contact information (no email address, no IM screen names, no telephone numbers, no street address)  so it is pretty useless as a way to utilize one’s friends list with applications besides Facebook since there is no way to cross-reference your friends using any personally identifiable association that would exist in another service.

When it comes to contact lists (i.e. the social graph), Facebook  is a roach motel. Lots of information about user relationships goes in but there’s no way for users or applications to get it out easily. Whenever an application like FacebookSync comes along which helps users do this, it is quickly shut down for violating their Terms of Use. Hypocrisy? Indeed.

Now playing: Lil Boosie & Webbie - Wipe Me Down (remix) (feat. Jim Jones, Fat Joe, Jadakiss & Foxx)


 

Categories: Social Software

I just read the post on the Skype weblog entitled What happened on August 16 about the cause of their outage which states

On Thursday, 16th August 2007, the Skype peer-to-peer network became unstable and suffered a critical disruption. The disruption was triggered by a massive restart of our users’ computers across the globe within a very short time frame as they re-booted after receiving a routine set of patches through Windows Update.

The high number of restarts affected Skype’s network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact.

Normally Skype’s peer-to-peer network has an inbuilt ability to self-heal, however, this event revealed a previously unseen software bug within the network resource allocation algorithm which prevented the self-healing function from working quickly.

This problem affects all networks that handle massive numbers of concurrent user connections whether they are peer-to-peer or centralized. When you deal with tens of millions of users logged in concurrently and something causes a huge chunk of them to log-in at once (e.g. after an outage or a synchronized computer reboot due to operating system patches) then your system will be flooded with log-in requests. All the major IM networks (including Windows Live) have all sorts of safeguards in place within the system to prevent this from taking down their networks although how many short outages are due to this specific issue is anybody’s guess.

However Skype has an additional problem when such events happen due to it’s peer-to-peer model which is described in the blog post All Peer-to-Peer Models Are NOT Created Equal -- Skype's Outage Does Not Impugn All Peer-to-Peer Models 

According to Aron, like its predecessor Kazaa, Skype uses a different type of Peer-To-Peer network than most companies. Skype uses a system called SuperNodes. A SuperNode Peer-to-Peer system is one in which you rely on your customers rather than your own servers to handle the majority of your traffic. SuperNodes are just normal computers which get promoted by the Skype software to serve as the traffic cops for their entire network. In theory this is a good idea, but the problem happens if your network starts to destabilize. Skype, as a company, has no physical or programmatic control over the most vital piece of its product. Skype instead is at the mercy of and vulnerable to the people who unknowingly run the SuperNodes.

This of course exposes vulnerabilities to any business based on such a system -- systems that, in effect, are not within the company's control.

According to Aron, another flaw with SuperNode models concerns system recovery after a crash. Because Skype lost its SuperNodes in the initial crash, its network can only recover as fast as new SuperNodes can be identified.

This design leads to a virtuous cycle when it comes to recovering from an outage. With most of the computers on the network being rebooted, they lost a bunch of SuperNodes and so when the computers came back online they flooded the remaining SuperNodes which in turn went down and so on…

All of this is pretty understandable. What I don’t understand is why this problem is just surfacing. After all, this isn’t the first patch Tuesday. Was the bug in their network resource allocation process introduced in a recent version of Skype? Has the service been straining for months and last week was just the tipping point? Is this only half the story and there is more they aren’t telling us?

Hmmm… 

Now playing: Shop Boyz - Party Like A Rockstar (remix) (feat. Lil' Wayne, Jim Jones & Chamillionaire)


 

Categories: Technology

My job at Microsoft is working on the contacts platform that is utilized by a number of Windows Live services. The contacts platform is a unified graph of the relationships our users have created across Windows Live. It includes a user's Windows Live Hotmail contacts, their Windows Live Spaces friends, their Windows Live Messenger buddies and anyone they've added to an access control list (e.g. people who can access their shared folders in Windows Live Skydrive or the events in their calendar). Basically, a while ago one of our execs thought it didn't make sense to build a bunch of social software applications each acting as a silo of user relationships and that instead we should have a unified graph of the user to user relationships within Windows Live. Fast forward a couple of years and we now have a clearer idea of the pros and cons of building a unified social graph.

Given the above, it should be no surprise that I read Brad Fitzpatrick's Thoughts on the Social Graph with keen interest since it overlaps significantly with my day job. I was particularly interested in the outlined goals for the developers API which are included below

For developers who don't want to do their own graph analysis from the raw data, the following high-level APIs should be provided: 

  1. Node Equivalence, given a single node, say "brad on LiveJournal", return all equivalent nodes: "brad" on LiveJournal, "bradfitz" on Vox, and 4caa1d6f6203d21705a00a7aca86203e82a9cf7a (my FOAF mbox_sha1sum). See the slides for more info.
  2. Edges out and in, by node. Find all outgoing edges (where edges are equivalence claims, equivalence truths, friends, recommendations, etc). Also find all incoming edges.
  3. Find all of a node's aggregate friends from all equivalent nodes, expand all those friends' equivalent nodes, and then filter on destination node type. This combines steps 1 and 2 and 1 in one call. For instance, Given 'brad' on LJ, return me all of Brad's friends, from all of his equivalent nodes, if those [friend] nodes are either 'mbox_sha1sum' or 'Twitter' nodes.
  4. Find missing friends of a node. Given a node, expand all equivalent nodes, find aggregate friends, expand them, and then report any missing edges. This is the "let the user sync their social networking sites" API. It lets them know if they were friends with somebody on Friendster and they didn't know they were both friends on MySpace, they might want to be.

here are the top three problems Brad and the rest of the Google folks working on this project will have to factor in as they chase the utopia that is a unified social graph.
  1. Some Parts of the Graph are Private: Although social networking sites with publicly articulated social networks are quite popular (e.g. MySpace) there are a larger number of private or semi-private social networks that either can only be viewed by the owner of the list (e.g. IM buddy lists) or some subset of the graph (e.g. private profiles on social networking sites MySpace, Facebook, Windows Live Spaces, etc). The latter is especially tricky to deal with. In addition, people often have more non-public articulated social networks (i.e. friends lists) than public ones despite the popularity of social networking sites with public profiles.

  2. Inadvertent Information Disclosure caused by Linking Nodes Across Social Networks: The "find missing friends of a node" feature in Brad's list sounds nice in theory but it includes a number of issues that users often consider to be privacy violations or just plain creepy. Let's say, I have Batman on my friend's list in MySpace because I think the caped crusader is cool. Then I join LiveJournal and it calls the find_missing_friends() API to identify which of my friends from other sites are using LiveJournal and it find's Bruce Wayne's LiveJournal? Oops, an API call just revealed Batman's secret identity. A less theoretical version of this problem occurred when we first integrated Windows Live Spaces with Windows Live Messenger, and some of our Japanese beta users were stunned to find that their supposedly anonymous blog postings were now a click away for their IM buddies to see. I described this situation briefly in my submission to the 2005 Social Computing Symposium.

  3. All "Friends" aren't Created Equal: Another problem is that most users don't want all their "friends" available in all their applications. One capability we were quite proud off at one time is that if you had Pocket MSN Messenger then we merged the contacts on your cell phone with your IM and email contacts. A lot of people were less than impressed by this behavior. Someone you have on your IM buddy list isn't necessarily someone you want in your cell phone address book. Over the years, I've seen more examples of this than I can count. Being "friends" in one application does not automatically mean that two users want to be "friends" in a completely different context.

These are the kinds of problems we've had to deal with on my team while also trying to make this scale to being accessed by services utilized by hundreds of millions of users. I've seen what it takes to build a system like this first hand and Brad & company have their work cut out for them. This is without considering the fact that they may have to deal with ticked of users or ticked off social networking sites depending on how exactly they plan to build this giant database of user friend lists.

PS: In case any of this sounds interesting to you, we're always hiring. :)


 

Categories: Platforms | Social Software

Brad Fitzpatrick, the founder of LiveJournal, who recently left Six Apart for Google has published notes on what he's going to be working on moving forward. It is an interesting read entitled Brad's Thoughts on the Social Graph which contains the following excerpts

Currently if you're a new site that needs the social graph (e.g. dopplr.com) to provide one fun & useful feature (e.g. where are your friends traveling and when?), then you face a much bigger problem then just implementing your main feature. You also have to have usernames, passwords (or hopefully you use OpenID instead), a way to invite friends, add/remove friends, and the list goes on. So generally you have to ask for email addresses too, requiring you to send out address verification emails, etc. Then lost username/password emails. etc, etc. If I had to declare the problem statement succinctly, it'd be: People are getting sick of registering and re-declaring their friends on every site., but also: Developing "Social Applications" is too much work.

Facebook's answer seems to be that the world should just all be Facebook apps.
...
Goals:
1. Ultimately make the social graph a community asset, utilizing the data from all the different sites, but not depending on any company or organization as "the" central graph owner. 
  1. Establish a non-profit and open source software (with copyrights held by the non-profit) which collects, merges, and redistributes the graphs from all other social network sites into one global aggregated graph. This is then made available to other sites (or users) via both public APIs (for small/casual users) and downloadable data dumps, with an update stream / APIs, to get iterative updates to the graph (for larger users)
...
Non-Goals:
  1. The goal is not to replace Facebook. In fact, most people I've talked to love Facebook, just want a bit more of their already-public data to be more easily accessible, and want to mitigate site owners' fears about any single data/platform lock-in. Early talks with Facebook about participating in this project have been incredibly promising. 

It seems to me that Facebook is the new Microsoft in that there are now a significant amount of people who are either upset at the level of "lock-in" they have created or are just plain jealous of their "wealth" who have created dedicated efforts to break their hegemony. It'll be interesting watching this play out.

From my perspective, I'm skeptical of a lot of the talk about social network portability because the conversation rarely seems to be user centric. Usually it's creators of competing services who are angry about "lock-in" because they can't get a new user's contacts from another service and spam them to gain "viral growth" for their service. As for the various claims of social network overload only the power users and geeks who join a new social network service a month (WTF is Dopplr?) have this problem.

A real social network is a community and users don't change communities at the drop of a hat. What I find more interesting is being able to bridge these communities instead of worrying about the 1% of users who hop from community to community like crack addled humming birds skipping from flower to flower.

I'll put it this way, when it comes to email which is more important? The ability to send emails to people regardless of what email service or mail client they use or the ability to import your contact list from one free email service into another when you switch service providers?


 

I learned about the Facebook Data Store API yesterday from a post by Marc Canter. The API is intended to meet the storage needs of developers building widgets applications on the Facebook widget platform. Before we decide if the API meets the needs of developers, we need to list what these needs are in the first place. A developer building a widget or application for a social network’s widget platform such as a gadget for Windows Live Spaces or an application for the Facebook platform needs to store

  1. Static resources that will be consumed or executed on the client such as images, stylesheets and script files. Microsoft provides this kind of hosting for gadget developers via Windows Live Gallery. This is all the storage needed for a gadget such as GMT clock.
  2. User preferences and settings related to the gadget. In many cases, a gadget may provide a personalized view of data (e.g. my Netflix queue or the local weather) or may simply have configuration options specific to the user which need to be saved. Microsoft provides APIs for getting, setting and deleting preferences as part of it’s Web gadgets framework. My Flickr badge gadget is an example of the kind of gadget that requires this level of storage.
  3. The application’s server-side code and application specific databases. This is the equivalent of the LAMP or WISC hosting you get from a typical Web hosting provider. No social networking site provides this for widget/gadget developers today. The iLike Facebook application is an example of the kind of application that requires this level of “storage” or at this level it should probably be called app hosting.

Now that we have an idea of the data storage needs of Web widget/gadget developers, we can now discuss how the Facebook Data Store API measures up. The API consists of three broad classes of methods; User Preferences, Persistent Objects and Associations. All methods can return results as XML, JSON or JSONP.

It is currently unclear if the API is intended to be RESTful or not since there is scant documentation of the wire format of requests or responses. 

User Preferences

Object Definition methods
* data.setUserPreference update one preference
* data.setUserPreferences update multiple preferences in batch
* data.getUserPreference get one preference of a user
* data.getUserPreferences get all preferences of a user

These methods are used to store key value pairs which may represent user preferences or settings for an application. There is a limit of 201 key<->value pairs which can be stored per user. The keys are numeric values from 0 – 200 and the maximum length of a preference value is 128 characters. 

Persistent Objects

Object Definition methods
* data.createObjectType create a new object type
* data.dropObjectType delete an object type and all objects of this type
* data.renameObjectType rename an object type
* data.defineObjectProperty add a new property
* data.undefineObjectProperty remove a previously defined property
* data.renameObjectProperty rename a previously defined property
* data.getObjectTypes get a list of all defined object types
* data.getObjectType get detailed definition of an object type

Developers can create new types which are analogous to SQL tables especially when you consider terminology like “drop” object, the ability to add new properties/columns to the type and being able to retrieve the schema  of the type which are all more common in relational database world than in object oriented programming.

 

Object Manipulation methods
* data.createObject create a new object
* data.updateObject update an object's properties
* data.deleteObject delete an object by its id
* data.deleteObjects delete multiple objects by ids
* data.getObject get an object's properties by its id
* data.getObjects get properties of a list of objects by ids
* data.getObjectProperty get an object's one property
* data.setObjectProperty set an object's one property
* data.getHashValue get a property value by a hash key
* data.setHashValue set a property value by a hash key
* data.incHashValue increment/decrement a property valye by a hash key
* data.removeHashKey delete an object by its hash key
* data.removeHashKeys delete multiple objects by their hash keys

This aspect of the API is almost self explanatory, you create an object type (e.g. a movie) then manipulate instances of this object using the above APIs. Each object can be accessed via a numeric ID or a string hash value. The object’s numeric ID is obtained when you first create the object although it isn’t clear how you obtain an object’s hash key. It also seems like there is no generic query mechanism so you need to store the numeric IDs or hash keys of the objects you are interested in somewhere so you don’t have to enumerate all objects looking for them later. Perhaps with the preferences API?

Associations

Association Definition methods
* data.defineAssociation create a new object association
* data.undefineAssociation remove a previously defined association and all its data
* data.renameAssociation rename a previously defined association
* data.getAssociationDefinition get definition of a previously defined association
* data.getAssociationDefinitions get definitions of all previously defined associations

An association is a named relationship between two objects. For example, "works_with" could be an association between two user objects. Associations don't have to be between the same types (e.g. a "works_at" could be an association between a user object and a company object). Associations take me back to WinFS and son of WinFS Entity Data Model which has a notion of a RelationshipType that is very similar to the above notion of an association. It is also similar to the notion of an RDF triple but not quite.

Association Manipulation methods
* data.setAssociation create an association between two objects
* data.setAssociations create a list of associations between pairs of objects
* data.removeAssociation remove an association between two objects
* data.removeAssociations remove associations between pairs of objects
* data.removeAssociatedObjects remove all associations of an object
* data.getAssociatedObjects get ids of an object's associated objects
* data.getAssociatedObjectCount get count of an object's associated objects
* data.getAssociatedObjectCounts get counts of associated objects of a list of objects.
* data.getAssociations get all associations between two objects

All of these methods should be self explanatory. Although I think this association stuff is pretty sweet, I’m unclear as to where all of this is expected to fall in the hierarchy of needs of an Facebook application. The preferences stuff is a no brainer. The persistent object and association APIs could be treated as a very rich preferences API by developers but this doesn’t seem to be living up to their potential. On the other hand, without providing something closer to an app hosting platform like Amazon has done with EC2 + S3, I’m not sure there is any other use for them by Web developers using the Facebook platform.

Have I missed something here?

Now playing: UGK - International Players Anthem (feat. Outkast)


 

Categories: Platforms

If you go to http://dev.live.com/liveid you’ll see links to Windows Live ID for Web Authentication and Client Authentication which enable developers to build Web or desktop applications that can be used to authenticate users via Windows Live ID. The desktop SDK are still in alpha but the Web APIs have hit v1. You can get the details from the Windows Live ID team blog post entitled Windows Live ID Web Authentication SDK for Developers Is Released which states  

Windows Live ID Web Authentication allows sites who want to integrate with the Windows Live services and platform. We are releasing a set of tools that make this integration easier than ever.  

Web Authentication works by sending your users to the Windows Live ID sign-in page by means of a specially formatted link. The service then directs them back to your Web site along with a unique, site-specific identifier that you can use to manage personalized content, assign user rights, and perform other tasks for the authenticated user. Sign-in and account management is performed by Windows Live ID, so you don't have to worry about implementing these details.

Included with the Web Authentication software development kit (SDK) are QuickStart sample applications in the ASP.NET, Java, Perl, PHP, Python, and Ruby programming languages. You can get the sample applications for this SDK from the Web Authentication download page>on Microsoft.com.

As one of the folks who's been championing opening up our authentication platform to Web developers, this is good news. I'm not particularly sold on using Windows Live ID as a single sign-on instead of sites managing their own identities but I do think that now that we allow non-Microsoft applications (e.g. mashups, widgets, etc) to act on behalf of Windows Live users via this SDK, there'll be a burst of new APIs coming out of Windows Live that will allow developers build applications that manipulate a user's data stored within Windows Live services.

Opening up our platform will definitely be good for users and will be good for the Web as well. Kudos, to the Windows Live ID folks for getting this out.

Now playing: Nappy Roots - Po' Folks


 

Categories: Web Development | Windows Live

How social networks handle multiple social contexts (e.g. my work life versus my personal life) has been on my mind this week. Today I was in a meeting where someone mentioned that most of the people he knows have profiles on both MySpace and Facebook because their real friends are on MySpace while their work friends are on Facebook. This reminded me that my wall currently has a mix of posts from Robert Scoble about random geek crap and posts by friends from Nigeria who I haven’t talked to in years catching up with me.

For some reason I find this interleaving of my personal relationships and my public work-related persona somewhat unsettling. Then there’s this post by danah boyd, loss of context for me on Facebook which contains the following excerpt

Anyhow, I know folks are still going wheeeeee about Facebook. And I know people generally believe that growth is nothing but candy-coated goodness. And while I hate using myself as an example (cuz I ain't representative), I do feel the need to point out that context management is still unfun, especially for early adopters, just as it has been on every other social network site. It sucks for teens trying to balance mom and friends. It sucks for college students trying to have a social life and not piss off their profs. It sucks for 20-somethings trying to date and balance their boss's presence. And it sucks for me.

I can't help but wonder if Facebook will have the same passionate college user base next school year now that it's the hip adult thing. I don't honestly know. But so far, American social network sites haven't supported multiple social contexts tremendously well. Maybe the limited profile and privacy settings help, but I'm not so sure. Especially when profs are there to hang out with their friends, not just spy on their students. I'm wondering how prepared students are to see their profs' Walls filled with notes from their friends. Hmmm...

as usual danah hits the nail on the head. There are a number of ways I can imagine social network sites doing a better job at supporting multiple social contexts but they all involve requiring some work from the user to set up their social contexts especially if they plan to become a permanent fixture in their user’s lives. However most social network sites seem more interested in being the equivalent of popular nightclubs (e.g. MySpace) than in becoming a social utility in the same way that email and instant messaging have become. Facebook  is the first widely popular social networking site I suspect will buck this trend. If there is one place there is still major room for improvement in their user experience [besides the inability to opt out of all the annoying application requests] it’s here. This is the one area where the site is weak, and if my experience and danah’s observations are anything to go by, eventually the site will be less of a social software utility and more of a place to hang out and we know what eventually happens to sites like that.  

Now playing: Gym Class Heroes - New Friend Request


 

Categories: Social Software

Matt Cutts has a blog post entitled Closing the loop on malware where he writes

Suppose you worked at a search engine and someone dropped a high-accuracy way to detect malware on the web in your lap (see this USENIX paper [PDF] for some of the details)? Is it better to start protecting users immediately, or to wait until your solution is perfectly polished for both users and site owners? Remember that the longer you delay, the more users potentially visit malware-laden web pages and get infected themselves.

Google chose to protect users first and then quickly iterate to improve things for site owners. I think that’s the right choice, but it’s still a tough question. Google started flagging sites where we detected malware in August of last year.

When I got home yesterday, my fiancée informed me that her laptop was infected with spyware. I asked how it happened and she mentioned that she’d been searching for sites to pimp her MySpace profile. Since we’d talked in the past about visiting suspicious websites I wondered why she chosen to ignore my advise. Her response? “Google didn’t put the This Site May Harm Your Computer warning on the link so I thought the site was safe. Google failed me.”

I find this interesting on several levels. There’s the fact that this feature is really useful and engenders a sense of trust in Google’s users. Then there’s the palpable sense of betrayal on the user’s part when Google’s “not yet perfectly polished” algorithms for detectings malicious software fails to indicate a bad site. Finally, there’s the observation that instead of blaming Microsoft who produces the operating system and theWeb  browser which were both infected by the spyware, she chose to blame Google who produced the search engine that led to the malicious site instead. Why do you think this is? I have my theories…

Now playing: Hurricane Chris - Ay Bay Bay


 

Categories: Technology

August 14, 2007
@ 03:19 AM

Recently I've seen a bunch of people I consider to be really smart sing the praises of Hadoop such as Sam Ruby in his post Long Bets, Tim O’Reilly in his post Yahoo!’s Bet on Hadoop, and Bill de hÓra in his post Phat Data. I haven’t dug too deeply into Hadoop due to the fact that the legal folks at work will chew out my butt if I did, there a number of little niggling doubts that make me wonder if this is the savior of the world that all these geeks claim it will be. Here are some random thoughts that have made me skeptical

  1. Code Quality: Hadoop was started by Doug Cutting who created Lucene and Nutch. I don’t know much about Nutch but I am quite familiar with Lucene because we adopted it for use in RSS Bandit. This is probably the worst decision we’ve made in the entire history of RSS Bandit. Not only are the APIs a usability nightmare because they were poorly hacked out then never refactored, the code is also notoriously flaky when it comes to dealing with concurrency so common advice is to never use multiple threads to do anything with Lucene.

  2. Incomplete Specifications: Hadoop’s MapReduce and HDFS are a re-implementation of Google’s MapReduce and Google File System (GFS)  technologies. However it seems unwise to base a project on research papers that may not reveal all the details needed to implement the service for competitive reasons. For example, the Hadoop documentation is silent on how it plans to deal with the election of a primary/master server among peers especially in the face of machine failure which Google solves using the Chubby lock service. It just so happens that there is a research paper that describes Chubby but how many other services within Google’s data centers do MapReduce and Google File System (GFS)  depend on which are yet to have their own public research paper? Speaking of which, where are the Google research papers on their message queueing infrastructure? You know they have to have one, right? How about their caching layer? Where are the papers on Google’s version of memcached?Secondly, what is the likelihood that Google will be as forthcoming with these papers now that they know competitors like Yahoo! are knocking off their internal architecture?

  3. A Search Optimized Architecture isn’t for Everyone: One of the features of MapReduce is that one can move the computation close to the data because “Moving Computation is Cheaper than Moving Data”. This is especially important when you are doing lots of processing intensive operations such as the kind of data analysis that goes into creating the Google search index. However what if you’re a site whose main tasks are reading and writing lots of data (e.g. MySpace) or sending lots of transient messages back and forth yet ensuring that they always arrive in the right order (e.g. Google Talk) then these optimizations and capabilities aren’t much use to you and a different set of tools would serve you better. 

I believe there are a lot of lessons that can be learned from how the distributed systems that power the services behind Google, Amazon and the like. However I think it is waaaay to early to be crowning some knock off of one particular vendors internal infrastructure as the future of distributed computing as we know it.

Seriously.

PS: Yes, I realize that Sam and Bill are primarily pointing out the increasing importance of parellel programming as it relates to the dual trends of (i) almost major website that ends up dealing with lots of data and has lots of traffic eventually eschews relational database features like joins, normalization, triggers and transactions because they are not cost effective and (ii) the increased large amounts of data that the we generate and now have to process due to falling storage costs. Even though their mentions of Hadoop are incidental it still seems to me that it’s almost become a meme, one which deserves more scrutiny before we jump on that particular band wagon. 

Now playing: N.W.A. - Appetite For Destruction


 

Categories: Platforms

It seems like I was just blogging about Windows Live Hotmail coming out of beta and it looks like there is already a substantial update to the service being rolled out. From the Windows Live Hotmail team’s blog post entitled August: Hotmail will soon bring you more of your requests, better performance we learn

We went out of beta in May, and we’re already releasing something new. Today, these new features will begin to roll our gradually to all our customers over the next few weeks, so if you don’t immediately see them, be patient, they’re coming!

More storage! Just when you were wondering how you’d ever fill up 2 or 4 GB of mail, we’ve given you more storage. Free users will get 5 GB and paid users will get 10 GB of Hotmail storage.

Contacts de-duplication: Do you have five different entries for the same person in your Contacts? Yeah, me too, but not anymore. We’re the first webmail service to roll out “contacts de-duplication”. If you get a message from “Steve Kafka” and click “add contact” but there’s already a Steve Kafka, we’ll let you know and let you add Steve’s other e-mail address to your existing “Steve Kafka” contact entry. We’re just trying to be smarter to make your life easier and faster. There’s also a wizard you can run to clean up your existing duplicate contacts.

Accepting meeting requests: If you receive a meeting request, such as one sent from Outlook, you can now click “accept” and have it added to your Calendar. This had existed for years in MSN Hotmail, and we’re adding it to Windows Live Hotmail now.

You can turn off the Today page (if you want to). If you’d rather see your inbox immediately upon login, you have the option to turn off the page of MSN news (called the Today page). The choice is yours. 

A nice combination of new features and pet peeves fixed with this release. The contacts duplication issue is particularly annoying and one I’ve wanted to see fixed for quite a while.

So far we’ve seen updates Spaces, SkyDrive, and now Mail within the past month. The summer of Windows Live is on here and so far it’s looking pretty good. I wonder what else Windows Live has up it’s sleeve?

Now playing: P. Diddy - That's Crazy (remix) (feat. Black Rob, Missy Elliott, Snoop Dogg & G-Dep)


 

Categories: Windows Live