February 08, 2008

Warning: Google Reader Congestion of Up to Five Hours

Recently, Google's gained a lot of good feeling in the blogosphere for how rapidly they are indexing blog posts as part of their universal search. But while their search side is getting quicker and quicker, it can sometimes be several hours before some posts make their way from being published to hitting Google Reader, with no apparent cause.

It's enough to make me think we need heavy traffic advisories, or warnings that show when a specific hub is congested, the way we now can with airports or freeways.

Not too long ago, Google Reader added a seemingly-small feature that showed when an item was published, and also when it hit Google Reader. Maybe they thought they were showing off how quickly they were indexed. But without a doubt, it'll likely only serve to highlight the times when they aren't getting there fast at all.


Wow - That Timestamp Gave You Away, Google

Today, my post on AssetBar coming to Twitter's aid took more than five and a half hours to reach Google Reader. In the meantime, I saw the post indexed by FriendFeed and AssetBar, added to Spokeo, and listed under my blog on Technorati. In parallel, a response post at The Last Podcast hit Google Reader several hours earlier, but my original post was nowhere to be found.

Finally, despite being posted at 11:21 a.m., Google Reader didn't post the piece until 4:53 p.m., a virtual eternity in the rapid fire blog world. In those five-plus hours, 37 different posts were added to TechMeme's river. In those five hours, I received 149 tweets on Twitter. In those five hours, my story went from what could consider to be "breaking" to "tired".

At times, it's been obvious to me that while Google Reader leads in offering a simplified user interface and ease of use, it lags other services badly in how quickly they fetch items. I often see stories hit the feed, and click through only to find out they already have dozens of comments - making me late to the conversation. Today, that gap was huge. Google didn't just show up late, they showed up last.

AssetBar Proposes Solution to Twitter Scaling Problem

While I've previously talked about AssetBar's capabilities as a next-generation social RSS feed reader, expected to open to the public soon, the secret sauce behind AssetBar's efforts is a distributed database system that eliminates a lot of the issues with traditional SQL or relational database environments. Their differentiated approach to the database means AssetBar is highly extensible, with futures not only in RSS feeds, but corporate intranets, Web office, and maybe... as a solution to popular services, like Twitter, who can't seem to stop going down.

As services on SQL databases, like Twitter, strain under dramatic growth in terms of users or activities, users can see downtime. And as AssetBar states in a post this morning, where its proposed they could act as a Twitter proxy, you could enter your Twitter credentials on their site, on their database, and it would interact with Twitter just as if you were on Twitter, but without being impacted by outages. If Twitter was down, your "tweets" would remain in queue, not blocked, as is the case today.

So what's the issue causing downtime in the first place?

AssetBar claims that as popular users like Jason Calacanis and Robert Scoble have gained more than 6,500 followers, and follow 6,500 folks, that means every single tweet is written and rewritten thousands and thousands of times. And just imagine if they're talking to each other. Double the problem. And writes are harder than reads. While it's just a 140 character message going out on the wire, multiply that by 6,500, and you're talking almost a million characters going somewhere. Assuming 300 words in a double-spaced Microsoft Word document, a well-populated tweet would have about 3,000 pages of impact, or reading the fabled monolith novel War and Peace more than two times.

But AssetBar says they have a smarter approach, using their database. They write, "It so happens that our new distributed database technology is rather well suited for twitter-style high-volume reliable messaging. "

Would they try to compete with Twitter? No way. They say "Twitter is the new mail", and it is now mission critical for many people. They want to solve the downtime issues, for the community.

Maybe they're on to something. Check out their full post, "Twitter-proxy: Any Interest?", and provide them feedback as to whether you think they should get in the ballgame and help Twitter-holics out. They would first look to the community's blessing, and get a nod from Team Twitter, before moving forward. But if their database is as unique and strong as they say it is, it could get real interesting real soon.

February 07, 2008

What If You Only Subscribed to Shared Item Feeds?

Last night, we discussed the importance of a well-kept shared link blog in Google Reader. Humans can still play an important role in filtering out the best of feeds from the rest, as smart folks can trump even the best written code in terms of determining humor, originality and insight.

The ease of creating and subscribing to link blogs in Google Reader has led to some actively searching out these link blogs, and instead of subscribing feed by feed, instead preferring to rely on the selections of others.

One blogger, with the nickname of "SeekGround", says he has subscribed to more than 300 individual shared item feeds, which he displays on his blog - an amazing number. I have to assume there are a number of commonly-subscribed feeds that would result in duplication, but SeekGround says he goes through them, primarily on his mobile phone, and shares those items he finds most useful.

Shockingly, despite having more than 300 individual feeds, it looks like his interests most overlap with me, of all people. In an insightful post, "Google Reader, Shared Items and Mobility", the blogger reveals that over the last 30 days, he's also shared 35 items from me, 19 from Frederic Lardinois of the Last Podcast, and 17 from Mike Reynolds, taking first, second and third, respectively.

Kindly, he writes, "I think that Louis Gray is making waves in the community lately and he may soon find himself considered part of the A-List rather than his self-stated position as a B-Lister."

I don't know about that... but it's fun to see SeekGround taking a new approach to consuming feeds, and finding so many shared items in common. While his blog is a relative unknown, with a Technorati Authority of "Zero", before tonight, I have to expect that would change. While some entrepreneurs are setting new bars in content creation, others are changing the world of content consumption.

Maybe, over time, there will be a big shift from those who are the content creators and filters, and those who are the consumers and readers. With Feedheads, Shared Reader, ReadBurner and RSSMeme out there now, Link Blogs are becoming a very big deal.

Also see:
Last Podcast: Shared Feeds, RSSmeme and Ecosystems
louisgray.com: How Soon Until People Demand Link Blog Portability?
louisgray.com: What I'm Reading and Sharing on Google Reader

My shared items link blog is here: http://www.google.com/reader/shared/05763917848110205585

Limelight Networks Searchme Spider Picking Up Speed

For the better part of this year, I've seen some odd traffic to my site from Limelight Networks, a content delivery network similar to the more well-known Akamai. Multiple times a day, I would see a visit, with no originating page, drop in on the site, look at a page or two, and then, just as quickly, it would leave. Later, this pace quickened, to the point that if I saw a visitor came in with no referrer, I assumed it was llnw.net. But, in the last few days, the rate has dramatically accelerated, to the point where this spider from llnw.net is more than 10 percent of my total traffic, and it rates as the number one domain accessing my site, ahead of even Comcast.

So what the heck is it doing?

Well, it's not 100% clear. The assumed spider drops in and advertises itself as Linux UNIX, running Mozilla 1.8.1.11, and displaying a monitor with the odd square resolution of 1300 x 1300. It generally looks at one page, takes off, and then comes back in a few minutes to get another one. No real rhyme or reason, and it's just as happy to suck down old pages as new ones.

So, is it caching my blog so that customers of Limelight Networks can access the pages faster? Is it taking a graphical snapshot, in the same way that www.archive.org has done to show how Web sites looked over time? I'm not exactly sure.

One theory, voiced in the forums at Webmaster World, titled "Unusual Traffic from Limelight Networks", says the activity is from a robot called "Searchme", a LLNW client. Going to www.searchme.com shows this as a possibility. Searchme, Inc. says, " Searchme delivers more meaningful and targeted search results to its users," and that its "intuitive category suggest technology provides users with a dynamic and rewarding search experience by delivering relevant results that are tailored specifically to their unique areas of interest."

But... alas, no search engine and no demo, yet.


LLNW.net keeps hitting the site, every few minutes.

While I'm sure my site and others get hammered all day long from Google and Yahoo! spiders, they don't trigger my Web statistics software to think they're actual people, as LLNW is doing. It could be because their spider acts so human-like that the JavaScript code I use to track accesses is fooled. But regardless, the activity is picking up steam, and at this pace, I wouldn't be surprised to see LLNW take in one of every five visits here, even if they are "junk" visits.


LLNW.net is grabbing 13% of my traffic, beating out even Comcast!

The question is, if it is Searchme who is pushing this spider, and if they are indeed planning to reveal their work at some point, will the world have the need for yet another search engine? I have no idea. But if these oddities are any indicator, something's going on worth watching.

For those of you who have blogs and Web sites that track this detail of activity, are you also seeing the LLNW traffic, and has it increased over time? Also, has anybody ever seen Searchme in action who can let us know what they're doing?