May 06, 2009

Microsoft Live Search Employee Says Search Engine Analysis Flawed

On Tuesday evening, I attended a get-together with some of the minds behind Powerset, the $100 million natural language search engine acquisition slowly being integrated into Microsoft's Live Search engine. The goal? To keep the San Francisco startup natives feeling still connected to their roots, and not detached as part of the Redmond software monolith, while also giving interested tech reporters and bloggers an opportunity to see why Microsoft has any chance in the search market against its primary foe, Google. While I'll talk later about some of the statistics they offered, there was a conveyed perception that traditional reviews of search engines are flawed in their simplicity, in their context, and are not in depth enough to gain a clear knowledge of what's complex subject matter.

Discussed in detail on Senior Program Manager Mark Johnson's blog, Deliberate Ambiguity (See: "How *not* to rate a search engine"), the feeling is that reviewers commonly perform three queries before rating an engine "good" or "bad" - their own name, a hot topic of the day, and then something completely random. If the engine being tested doesn't match expectations, then it's off to the woodshed. (See: Cuil, Searchme, and other engines that have not been embraced for such examples)


Is Live Search Feeling Lucky? They Say Try It for a Week.

Instead of a simple Live Search vs. Google three query test, Mark recommended users try out Live Search for a full week, arguing that their own experiences would trump any simple demo. And as we discussed yesterday, one person's experience could be wildly different from another's, people primarily focus on the first search result, ignoring the bulk of the iceberg, and that it can be extremely difficult to infer intent and importance of results.

For example, as we discussed yesterday, assuming the working set of a search engine is Wikipedia, as Powerset's search engine prototype was, the web crawler will incorrectly determine that years associated with entries are of incredible importance, given how often they are referenced. As a result, rather than World War Two events rising to the top in importance, computers are more likely to trust results labeled '1944' or '1942' - so how do you teach computers to think like humans and correctly rate influence and importance? He adds, concluding his post, "this is rocket science."

At the quick meetup, questions around Microsoft's potential to dethrone Google were plentiful. We talked about how the once-mighty Alta Vista had once had a significant lead, and even after Google's debut, it held a corner in image search and language translation, before those too were eroded. Could Microsoft find a position that Google just doesn't do well? Could Microsoft start to be good enough such that if people tried to use Live Search for a full week, that Google would seem 'less good'? I even overheard a comment at the event that said in user groups, those surveyed preferred results that displayed a Google logo, even if the results shown were from Live Search.

The Google brand just means trusted search for many people, and even the Microsoft monolith is finding competition to be incredibly difficult. Google managed to find a market space that didn't need Microsoft, and could compete on its own revenue terms, without being frozen out by a competitor who could offer a parallel offer for free, a la Netscape. Even if Microsoft has built a better mousetrap, it is going to need to find a way to communicate that to users and deliver them a compelling reason to make a switch - and if Mark's words are to be believed, users and press aren't doing the needed research to help communicate their progress.