July 10, 2009

Real-time Search: What's Most Important Now, Not Most Accurate

This afternoon, at TechCrunch's Real-Time Crunchup event, representatives from many of the innovators in the real-time search space had a quick round table aimed at furthering the discussion, framed by a question by moderator Erick Schonfeld, who said that some on the panel may believe real-time search is defined by Twitter Search, while others believe it is "everything on the Internet, but with a freshness or recency component". And while many different companies, including the standard-bearers, like Google and Microsoft, are looking to take on this new challenge, how they are doing it differs greatly.

Danny Sullivan, author of Search Engine Land, said, "We need better definitions of what it is, so as consumers and users, we understand what we are interacting with. Through Twitter and a few other services, you have the option to publish in a few seconds. Maybe you call it social sharing search." (He also posted a summary of the players last night)

Some of the participants could be defined by how large a percentage of their data was initiated through Twitter, and how they worked with the data, including filtering.

"I would define it as what are people saying in real time about my topic," said Gerry Campbell of Collecta. "It's not what is most important, but it's what is in real time now."

This bifurcation of the "one right answer", often championed by the existing search leaders, versus what's most right "now" is helping to separate the old school search engines from this new breed. But don't think that the more-established companies are taking this lying down.

Google's Matt Cutts, who has been at the company since 2000, said "we have always talked about freshness of content." He relayed a history of his time at Google, saying they once had a "war room" of how they could refresh their search index as frequently as a month. By 2003, the company had moved from monthly updates to daily updates, and a few years later, in 2007, integrated the company's Blog Search product into its main search results. "We have rearchitected our system to be as recent as possible," Cutts said.

As updates flow in at an ever-increasing pace from all corners of the Web, search engines have the daunting task of getting accurate responses out there, while ignoring off-topic or harmful data, such as spam. And those who manage to get the formula right will have a serious leg up over those who don't filter well, making their results more noise than signal.

"Drinking from the firehose is a ticking time bomb," said Kimbal Musk of OneRiot. "Even by filtering 90 percent of what is going on with the Iran election, you're still only going to get a tiny slice, and a good portion of that is spam. If you don't filter content, you are going to get more and more spam." He later added, "If you stick to Twitter alone, you will have a spam-filled and biased data set."

With Twitter's API getting to a point where more and more companies are relying on it as their engine and data source, each is working of a common data set, and how they interact with the information will make the difference. And yes, Microsoft or Google may give you one result that is most accurate, but not for this moment, and not with any kind of impact from your friends or in terms of how that data is being interacted with in real time.

Sean Stutcher of Microsoft clearly stated this information is becoming more relevant, saying, "The sentiment around a link could be changing, and that might become very relevant to a user."

In an isolated search world, where an index is an index and the right answer is the right answer, that might not matter. But in real-time, it could matter immensely. As each of these companies works through their user interfaces, their data sets, and improves filters and social aspects, it should be very interesting to see how they separate from the pack and help define their goal.