Every once in a while I see a developer of a news aggregator that decides to add a 'feature' that unnecessarily chomps down the bandwidth of a web server in a manner one could classify as rude. The first I remember was Syndirella which had a feature that allowed you to syndicate an HTML page then specify regular expressions for what parts of the feed you wanted it to treat as titles and content. There are three reasons I consider this rude,
- If a site hasn't put up an RSS feed it may be because they don't want to deal with the bandwidth costs of clients repeatedly hitting their sites on behalf of a few users
- An HTML page is often larger than the corresponding RSS feed. The Slashdot RSS feed is about 2K while just the raw HTML of the front page of slashdot is about 40K
- An HTML page could change a lot more often than the RSS feed [e.g. rotating ads, trackback links in blogs, etc] in situations where an RSS feed would not
For these reasons I tend to think that the riught thing to do if a site doesn't support RSS is to send them a request that they do highlighting its benefits instead of eating up their bandwidth.
The second instance I've seen of what I'd call rude bandwidth behavior is a feature of NewsMonster that Mark Pilgrim complained about last year where every time it finds a new RSS item in your feed, it will automatically download the linked HTML page (as specified in the RSS item's link
element), along with all relevant stylesheets, Javascript files, and images. Considering that the user may never click through to web site from the RSS view this is potentially hundreds of unnecessary files being downloaded by the aggregator a day. This is not an exaggeration, I'm subscribed to a hundred feeds in my aggregator and there are is an average of two posts a day to each feed so downloading the accompanying content and images is literally hundreds of files in addition to the RSS feeds being downloaded.
The newest instance of unnecessary bandwidth hogging behavior I've seen from a news aggregator was pointed out by Phil Ringnalda's comments about excessive hits from NewsCrazwler which I'd also seen in my referrer logs and had been puzzled about. According to the answer on the NewzCrawler support forums when NewzCrawler updates the channel supporting wfw:commentRss it first updates the main feed and then it updates comment feeds. Repeatedly downloading the RSS feed for the comments to each entry in my blog when the user hasn't requested them is unnecessary and quite frankly wasteful.
Someone really needs to set up an aggregator hall of shame.