November 11, 2009

Twitter Plucks Data Management Guru from Yahoo!

That Twitter is dealing with massive amounts of data flowing through its servers these days would be an understatement, as the service sees strong growth and significant mindshare. With the company having passed what looks to have been its rockiest struggles over the last twelve months, Twitter is now getting to focus on rolling out some significant new features, from Lists to geolocation, trend definitions and retweets. But the microblogging giant looks like it is taking extra steps to harness the power of its rapidly-expanding data set.

If the company's own team list is to be believed, they just picked up Utkarsh Srivastava, a highly respected senior research scientist at Yahoo!, who is best known for his work on building large-scale distributed systems, specifically his efforts with Hadoop.

Hadoop, similar to the Google File System, is a framework that enables applications to work over distributed server nodes and significant data sets - potentially ranging in the petabytes. Yahoo!, Google's off and on competitor, has been the company most associated with Hadoop. While at Yahoo!, Srivastava was one of the original designers of "Pig", an Apache project for analyzing large data sets, which leveraged Hadoop. (See also the research paper: Pig Latin: A Not-So-Foreign Language for Data Processing)

Srivastava, a PhD graduate from Stanford University in Computer Science, has been working at Yahoo! Research since 2006. (See his home page and LinkedIn profile)

Not knowing what aspects of Twitter Srivastava may be working on, it's premature to assume whether his efforts will be primarily focused on new initiatives, or simply helping the company scale its growth. I can dream and hope that he can be the missing piece that brings Twitter's high potential search engine fully online, but that is no doubt a big project indeed.

Update: This hire has been confirmed by Srivastava and also covered by TechCrunch.