I've recently ben thinking about the problems facing search and navigation systems that depend on metadata applied to content provided by the creator of the content. This includes systems like Technorati Tags which searches the <category> elements in various RSS feeds and folksonomies like del.icio.us which searches tags applied to links submitted by users.
A few months ago I wrote a post entitled Technorati Tags: Why Do Bad Ideas Keep Resurfacing? which pointed out that Technorati Tags had the same problems that had plagued previous metadata self-annotation schemes on the Web such as HTML META tags. The main problem being that People Lie. Since then I've seen a number of complaints from developers of search engines that depend on RSS metadata.
In a comment to a post entitled Blogspot Spam in Matthew Mullenweg's weblog, Bob Wyman of PubSub.com writes
A very high percentage of the spam blogs that we process at PubSub.com also come from blogspot. We’ve got more serious “problems” in Japan and China, however, for the English language, blogspot is pretty much “spamspot.” It is, as always, disappointing to see people abuse a good and free service like that offered by Google/Blogspot in such a way.
In a post entitled Turning Blogspot Off Scott Johnson of Feedster wrote
All Blogspot blogs right now are included in every Feedster search by default. And now, due to the massive problems with spam on Blogspot, we're actually at the point of saying "Why don't we make searching Blogspot optional for all Feedster users". What's going on is that spammers have learned how to massively exploit Blogspot -- to the point where at times 90% of the blog traffic we get from Blogspot is spam.
Now that's bad. Actually this spam issue just plain sucks. And its starting to ruin the user experience that people have with Feedster.
The main reason these spam blogs haven't started affecting the Technorati Tags feature is that Blogspot doesn't support categories. However it is clear that the same problems search engines faced when they decided to trust HTML metadata are beginning to show up when it comes to searching RSS metadata. This is one place where established search engines would have a leg up on upstarts like Feedster and PubSub if they got into the RSS search market since they've already had to adapt to all sorts of 'search engine optimization' tricks.
On a related note, combining the above information about the high number of spam blogs on Google's Blogspot service with the recent article Bloggers Pitch Fits Over Glitches which among other things states
In fact, enter "Blogger sucks" in Google and you get 720,000 results, with most of the entries on the first few pages (read: the most popular) dedicated to these exasperating tech snafus. It can make for some pretty ugly reading. Imagine what they might say if they actually paid for the service?
But if you look at Blogger's status page, which lists service outages, you can see why they are so mad.
It seems that Doc Searles may have been onto something about Google quiting innovating in Blogger.