There have been a number of amusing discussions in the recent back
and forth between Robert Scoble and several others on whether OPML is a
crappy XML format. In posts such as OPML "crappy" Robertson says and More on crappy formats Robert
defends OPML. I've seen some really poor arguments made as people
rushed to bash Dave Winer and OPML but none made me want to
join the discussion until this morning.
In the post Some one has to say it again… brainwagon writes
Take for example Mark Pilgrim's comments:
I just tested the 59 RSS feeds I subscribe to in my news
aggregator; 5 were not well-formed XML. 2 of these were due to
unescaped ampersands; 2 were illegal high-bit characters; and then
there's The Register (RSS), which publishes a feed with such a wide
variety of problems that it's typically well-formed only two days each
month. (I actually tracked it for a month once to test this. 28 days
off; 2 days on.) I also just tested the 100 most recently updated RSS
feeds listed on blo.gs (a weblog tracking site); 14 were not
well-formed XML.
The reason just isn't that programmers are lazy (we are, but we
also like stuff to work). The fact is that the specification itself is
ambiguous and weak enough that nobody really knows what it means. As a
result, there are all sorts of flavors of RSS out there, and parsing
them is a big hassle.
The promise of XML was that you could ignore the format and
manipulate data using standard off-the-shelf-tools. But that promise is
largely negated by the ambiguity in the specification, which results in
ill-formed RSS feeds, which cannot be parsed by standard XML feeds.
Since Dave Winer himself managed to get it wrong as late as the date of
the above article (probably due to an error that I myself have done,
cutting and pasting unsafe text into Wordpress) we really can't say
that it's because people don't understand the specification unless we
are willing to state that Dave himself doesn't understand the
specification.
As someone who has (i) written a moderately popular RSS
reader and (ii) worked on the XML team at Microsoft for three years, I
know a thing or two about XML-related specifications. Blaming malformed
XML in RSS feeds on the RSS specification is silly. That's like blaming
the large number of HTML pages that don't validate on the W3C's HTML
specification instead of on the fact that instead of erroring on
invalid web pages web browsers actually try to render them. If web
browsers didn't render invalid web pages then they wouldn't exist on
the Web.
Similarly, if every aggregator rejected invalid feeds then
they wouldn't exist. However, just like in the browser wars,
aggregator authors consider it a competitive advantage to be able to
handle malformed feeds. This has nothing to do with the quality of the
RSS specification [or the HTML specification], this is all
about applications trying to get marketshare.
As for whether OPML is a crappy spec? I've had to read a lot of
technology specifications in my day from W3C recommendations and IETF
RFCs to API documentation and informal specs. They all suck in their
own ways. However experience has thought me that the bigger the spec,
the more it sucks. Given that, I'd rather have a short, human readable
spec that sucks a little (e.g. RSS, XML-RPC, OPML etc.) than a large, jargon filled specificaton which sucks a whole lot more (e.g. WSDL, XML Schema, C++, etc). Then there's the issue of using the right tool for the job but I'll leave that rant for another day.