Microformats vs. XML: Was the XML Vision Wrong?

January 12, 2006

@ 09:58 PM

Over a year ago, I wrote a blog post entitled SGML on the Web: A Failed Dream? where I asked whether the original vision of XML had failed. Below are excerpts from that post

The people who got together to produce the XML 1.0 recommendation where motivated to do this because they saw a need for SGML on the Web. Specifically
their discussions focused on two general areas:
Classes of software applications for which HTML was an inadequate information format
Aspects of the SGML standard itself that impeded SGML's acceptance as a widespread information technology

The first discussion established the need for SGML on the web. By articulating worthwhile, even mission-critical work that could be done on the web if there were a suitable information format, the SGML experts hoped to justify SGML on the web with some compelling business cases.

The second discussion raised the thornier issue of how to "fix" SGML so that it was suitable for the web.

And thus XML was born.
...
The W3C's attempts to get people to author XML directly on the Web have mostly failed as can be seen by the dismal adoption rate of XHTML and in fact many [including myself] have come to the conclusion that the costs of adopting XHTML compared to the benefits are too low if not non-existent. There was once an expectation that content producers would be able to place documents conformant to their own XML vocabularies on the Web and then display would entirely be handled by stylesheets but this is yet to become widespread. In fact, at least one member of a W3C working group has called this a bad practice since it means that User Agents that aren't sophisticated enough to understand style sheets are left out in the cold.

Interestingly enough although XML has not been as successfully as its originators initially expected as a markup language for authoring documents on the Web it has found significant success as the successor to the Comma Separated Value (CSV) File Format. XML's primary usage on the Web and even within internal networks is for exchanging machine generated, structured data between applications. Speculatively, the largest usage of XML on the Web today is RSS and it conforms to this pattern.

These thoughts were recently rekindled when reading Tim Bray's recent post Don’t Invent XML Languages where Tim Bray argues that people should stop designing new XML formats. For designing new data formats for the Web, Tim Bray advocates the use of Microformats instead of XML.

The vision behind microformats is completely different from the XML vision. The original XML inventers started with the premise that HTML is not expressive enough to describe every possible document type that would be exchanged on the Web. Proponents of microformats argue that one can embed additional semantics over HTML and thus HTML is expressive enough to represent every possible document type that could be exchanged on the Web. I've always considered it a gross hack to think that instead of having an HTML web page for my blog and an Atom/RSS feed, instead I should have a single HTML page with <div class="rss:item"> or <h3 class="atom:title"> embedded in it instead. However given that one of the inventors of XML (Tim Bray) is now advocating this approach, I wonder if I'm simply clinging to old ways and have become the kind of intellectual dinosaur I bemoan.