October 07, 2010

New Twitter Search Backend: Faster, Not Much Deeper

There are many things Twitter does very well, and some where the company and its service could stand to improve. As the company's popularity has grown, and its employee base has swelled, one of the biggest holes in its lineup has been around its search engine. More than a year ago, after becoming increasingly frustrated with its shortcomings, I gave up and said it was very, very broken. Maybe that wasn't nice. But it was true.

With news today that Twitter has quietly introduced a new back-end for their search engine, it is worth revisiting to see if what they have delivered solved the listed issues, or if, instead, it is just a new back-end, built to perform the basics on a larger scale.

Issue Number One: Search Results Limited to a Few Days

At the time, I used an example to show how Erin Kotecki Vest (@queenofspain) was shown as having mentioned the word "Obama" five times in four days, when she had, of course, mentioned him much more often. At this point, Erin isn't talking about Obama quite as often, so a new, similar search is needed for testing.

I tested the word "Giants" from Mychal Urban, a Bay Area sports writer (@MUrbanCSN). The results went back six full days, covering 7 tweets. Similarly, searching for the word "TechCrunch" from Mike Arrington (@arrington) delivered 7 days worth of results, and 33 total tweets.

It appears the "deeper" database goes about a week, up from low points of 2-3 days.

Issue Number Two: Older Tweets Were Missing

Twitter Search has always offered the option to search the database for a subset of dates. At the time, I used the above example with Erin, but searching from January 1st to May 21st of 2009. A search of this type failed, saying "the page you were looking for doesn't exist".

With the new database, these older tweets are still not available. For example, searching for "techcrunch" from "arrington" between June 1st and August 1st of this year says "no results".

So this is not yet fixed.

Issue Number Three: References of People Were Not Tracked

At the time, I tested to see if Twitter could find my references of an individual, like Mashable's @adamostrow. At the time, it reported "no results. Tonight, I tried the same search, but with Robert Scoble (@scobleizer). If you look at the results, Twitter says there are no results. But I actually sent a reply to Scoble as recently as October 3rd.

On the flip side, I could get Twitter to return results for mentions of @erickschonfeld, displaying three results in the last two days. For whatever reason, this particular query goes at least two days, but less than five.

This appears to suffer from the short database (by time), but is not a separate issue.

Issue Number Four: Oprah Doesn't Exist In Twitter Search

For whatever reason, Twitter Search didn't display results from Oprah (@oprah) two years ago. Tonight, I searched again, and found no results.

As Oprah's last tweet was September 22nd, it may not have been recent enough to see in the database, and doesn't mean it's Oprah specific. A similar search on my name goes back 7 days.

I assume, in this case, that maybe if Oprah had tweeted in the last week, she would be there. Other high profile, high subscriber accounts, like @aplusk, go back a week.



The post from Twitter Engineering focuses on how to keep the current capabilities available under increasing load. As Twitter's focus is on real-time, they note, so should their search engine be. This means, essentially, that older tweets are much less relevant than new tweets - so much so that they drop out of the index after a week's time, both on the old version and the new.

The sign that "older tweets are temporarily unavailable" is seen anytime you hit the end of the road on Twitter search. I had hoped the new database, which sounds technically impressive, would have moved the needle further into the past, but it hasn't yet. The team says "The first difference you might notice is the bigger index, which is now twice as long -- without making searches any slower." If this is the case, then they are, without saying it specifically, acknowledging that Twitter Search had deprecated to only two or three days worth of data, and now it is at seven days, for most searches.

I give Twitter a lot of credit where it is due for the company's maturation and increased feature set, as well as reduced downtime, but it looks like the full functionality once promised by its search engine is not near fruition. Older tweets might be available on Google Realtime search.