Greg Linden has a blog post entitled Yahoo building a Google FS clone? where he writes
The Hadoop open source project is building a clone of the powerful Google cluster tools Google File System and MapReduce.I was curious to see how much Yahoo appears to be involved in Hadoop. Doug Cutting, the primary developer of Lucene, Nutch, and Hadoop, is now working for Yahoo but, at the time, that hiring was described as supporting an independent open source project.Digging further, it seems Yahoo's role is more complicated. Browsing through the Hadoop developers mailing list, I can see that more than a dozen people from Yahoo appear to be involved in Hadoop. In some cases, the involvement is deep. One of the Yahoo developers, Konstantin Shvachko, produced a detailed requirement document for Hadoop. The document appears to lay out what Yahoo needs from Hadoop, including such tidbits as handling 10k+ nodes, 100k simultaneous clients, and 10 petabytes in a cluster.Also noteworthy is Eric Baldeschwieler, a director of software development at Yahoo, who recently talked about direct support from Yahoo for Hadoop. Eric said, "How we are going to establish a testing / validation regime that will support innovation ... We'll be happy to help staff / fund such a testing policy."
I find this effort by Yahoo! to be rather interesting given that platform pieces like GFS, BigTable, MapReduce and Sawzall give Google quite the edge in building mega-scale services and in Greg Linden's words are 'major force multipliers' that enable them to pump out new online services at a rapid pace. I'd expect Google's competitors to build similar systems and keep them close to their chest not give them away. I suspect that the reason Yahoo! is going this route is that they don't have enough folks to build this in-house and have thus collaborated with Hadoop project to get some help. This could potentially backfire since there is nothing stopping small or large competitors from reusing their efforts especially if it uses a traditional Open Source license.
On a related note, Greg also posted a link to an article by David F. Carr entitled How Google Works which has the following interesting quote
Google has a split personality when it comes to questions about its back-end systems. To the media, its answer is, "Sorry, we don't talk about our infrastructure." Yet, Google engineers crack the door open wider when addressing computer science audiences, such as rooms full of graduate students whom it is interested in recruiting. As a result, sources for this story included technical presentations available from the University of Washington Web site, as well as other technical conference presentations, and papers published by Google's research arm, Google Labs.
I do think it is cool that Google developers publish so much about the stuff they are working on. One of the things I miss from being on the XML team at Microsoft is being around people with a culture of publishing research like Erik Meijer and Michael Rys. I even got a research paper on XML query languages published while on the team. I'd definitely would like to publish research quality papers on some of the stuff I'm working on now. I've done MSDN articles and a ThinkWeek paper in the past few years, it's probably about time I start thinking about writing a research paper again.
PS: If you work on online services and you don't read Greg Linden's blog, you are missing out. Subscribed.