A few months ago in my post GMail Domain Change Exposes Bad Design and Poor Code, I wrote Repeat after me, a web page is not an API or a platform. It seems some people are still learning this lesson the hard way. In the post The danger of running a remix service Richard MacManus writes
Populicio.us was a service that used data from social bookmarking site del.icio.us, to create a site with enhanced statistics and a better variety of 'popular' links. However the Populicio.us service has just been taken off air, because its developer can no longer get the required information from del.icio.us. The developer of Populicio.us wrote:
"Del.icio.us doesn't serve its homepage as it did and I'm not able to get all needed data to continue Populicio.us. Right now Del.icio.us doesn't show all the bookmarked links in the homepage so there is no way I can generate real statistics."
This plainly illustrates the danger for remix or mash-up service providers who rely on third party sites for their data. del.icio.us can not only giveth, it can taketh away.
It seems Richard Macmanus has missed the point. The issue isn't depending on a third party site for data. The problem is depending on screen scraping their HTML webpage. An API is a service contract which is unlikely to be broken without warning. A web page can change depending on the whims of the web master or graphic designer behind the site.
Versioning APIs is hard enough, let alone trying to figure out how to version an HTML website so screen scrapers are not broken. Web 2.0 isn't about screenscraping. Turning the Web into an online platform isn't about legitimizing bad practices from the early days of the Web. Screen scraping needs to die a horrible death. Web APIs and Web feeds are the way of the future.