More on RDF, The Semantic Web and Perpetual Motion Machines

November 11, 2003

@ 02:03 PM

My post from yesterday garnered a couple of responses from the RDF crowd who questioned the viability of the approaches I described. Below I take a look at some of their arguments and relate them to practical examples of exchanging information using XML I have encountered in my regular development cycle.

Shelley Powers writes

One last thing: I wanted to also comment on Dare Obasanjo's post on this issue. Dare is saying that we don't need RDF because we can use transforms between different data models; that way everyone can use their own XML vocabulary. This sounds good in principle, but from previous experience I've had with this type of effort in the past, this is not as trivial as it sounds. By not using an agreed on model, not only do you now have to sit down and work out an agreement as to differences in data, you also have to work out the differences in the data model, too. In other words -- you either pay upfront, once; or you keep paying in the end, again and again. Now, what was that about a Perpetual Motion Machine, Dare?

In responding to Shelley's post it is easier for me if I use a concrete example. RSS Bandit uses a custom format that I came up with for describing a user's list of subscribed feeds. However in the wild, other news aggregators us differing formats such as OPML and OCS. To ensure that users who've used other aggregators can try out RSS Bandit without having to manually enter all their feeds I support importing feed subscription lists in both the OPML and OCS format even though this is distinct from the format and data model I use internally. This importation is done by applying an XSLT to the input OPML or OCS file to convert it to my internal format then converting that XML into the RSS Bandit object model. The stylesheets took me about 15 to 30 minutes to write for each one. This is the XML-based solution.

Folks like Shelley believe my problem could be better solved by RDF and other Semantic Web technologies. For example, if my internal format was RDF/XML and I was trying to import an RDF-based format such as OCS then instead of using a language like XSLT that performs a syntactic transform of one XML format to the other I'd use an ontology language such as OWL to map between the data models of my internal format and OCS. This is the RDF-based solution.

Right of the bat, it is clear that both approaches share certain drawbacks. In both cases, I have to come up with a transformation from one represention of a feed list to another. Ideally, for popular formats there would be standard transformations described by others to move from one popular format to another (e.g. I don't have to write a transformation for WordML to HTML but do for WordML to my custom document format) so developers who stick to popular formats simply have to locate the transformation as opposed to actually authoring it themselves.

However there are further drawbacks to using the semantics based approach than using the XML-based syntactic approach. In certain cases, where the mapping isn't merely a case of showing equivalencies between the semantics of similarly structured elemebts (e.g. the equivalent of element renaming such as stating that a url and link element are equivalent) an ontology language is insufficient and a Turing complete transformation language like XSLT is not. A good example of this is another example from RSS Bandit. In various RSS 2.0 feeds there are two popular ways to specify the date an item was posted, the first is by using the pubDate element which is described as containing a string in the RFC 822 format while the other is using the dc:date element which is described as containing a string in the ISO 8601 format. Thus even though both elements are semantically equivalent, syntactically they are not. This means that there still needs to be a syntactic transformation applied after the semantic transformation has been applied if one wants an application to treat pubDate and dc:date as equivalent. This means that instead of making one pass with an XSLT stylesheet to perform the transformation in the XML-based solution, two transformation techniques will be needed in the RDF-based solution and it is quite likely that one of them would be XSLT.

The other practical concern is that I already know XSLT and have good books to choose from to learn about it such as Michael Kay's XSLT : Programmer's Reference and Jeni Tennison's XSLT and XPath On The Edge as well as mailing lists such as xsl-list where experts can help answer tough questions.

From where I sit picking an XML-based solution over an RDF-based one when it comes to dealing with issues involving interchange of XML documents just makes a lot more sense. I hope this post helps clarify my original points.

Ken MacLeod also wrote

In his article, Dare suggests that XSLT can be used to transform to a canonical format, but doesn't suggest what that format should be or that anyone is working on a common, public repository of those transforms.

The transformation is to whatever target format the consumer is comfortable with dealing with. In RSS Bandit the transformations are OCS/OPML to my internal feed list format and RSS 1.0 to RSS 2.0. There is no canonical transformation to one Über XML format that will solve every one's problems. As for keeping a common, public repository of such transformations that is an interesting idea which I haven't seen anyone propose in the past. A publicly accessible database of XSLT stylesheets for transforming between RSS 1.0 and RSS 2.0, WordML to HTML, etc. would be a useful addition to the XML community.

Sam Ruby muddies the waters in his post Blind Spots and subsequent comments in that thread by confusing the use cases around XML as a data interchange format and XML as a storage data format. My comments above have been about XML as a data interchange format, I'll probably post more in future about RDF vs. XML as a data storage format using the thread in Sam's blog for context.

Categories: XML

« RDF, The Semantic Web and Perpetual Moti... | Home | Mindless Link Propagation: Top 50 IRC Qu... »

Tuesday, 11 November 2003 16:47:50 (GMT Standard Time, UTC+00:00)

My my, I happen to agree with most of what you're saying here ;-)

As it happens, I'm using XSLT in my aggregator code too, to convert RSS x.x into RSS 1.0, so that it can be read as RDF. For many problems, solutions using tools like XSLT are much easier to implement.

The thing is, what do you do with other forms of data? I'm using an RDF store, so I can take in stuff like FOAF or iCal RDF without any transformation. Ok, so perhaps if I was using an XML DB then I could do the same with other XML formats. But it is at that point where the RDF tools become the easier option. Let's say an RSS channel has an author identified as an foaf:Person. I can get the FOAF profile of the person, and from that I could get that persons iCal schedule, and find out where they're going to be for the next few days. All with queries and simple inference over the RDF store.

I'm not entirely sure how you'd go about doing something like that using XSLT, though as it's Turing-complete I guess you could...

Danny

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for More on RDF, The Semantic Web and Perpetual Motion Machines - Dare Obasanjo's weblog