I've been a long time skeptic when it comes to RDF and the Semantic Web. Every once in a while I wonder if perhaps what I have a problem with is the W3C's vision of the Semantic Web as opposed to RDF itself. However in previous attempts to explore RDF I've been surprised to find that its proponents seem to ignore some of the real world problems facing developers when trying to use RDF as a basis for information integration.
Recently I've come across blog posts by RDF proponents who've begun to question the technology. The first is the blog post entitled Crises by Ian Davis where he wrote
We were discussing the progress of the Dublin Core RDF task force and there were a number of agenda items under discussion. We didn’t get past the first item though - it was so hairy and ugly that no-one could agree on the right approach. The essence of the problem is best illustrated by the dc:creator
term. The current definition says An entity primarily responsible for making the content of the resource.
The associated comments states Typically, the name of a Creator should be used to indicate the entity
and this is exactly the most common usage. Most people, most of the time use a person’s name as the value of this term. That’s the natural mode if you write it in an HTML meta tag and it’s the way tens or hundreds of thousands of records have been written over the past six years...Of course, us RDFers, with our penchant for precision and accuracy take issue with the notion of using a string to denote an “entity”. Is it an entity or the name of an entity. Most of us prefer to add some structure to dc:creator
, perhaps using a foaf:Person
as the value. It lets us make more assertions about the creator entity.
The problem, if it isn’t immediately obvious, is that in RDF and RDFS it’s impossible to specify that a property can have a literal value but not a resource or vice versa. When I ask “what is the email address of the creator of this resource?” what should the (non-OWL) query engine return when the value of creator is a literal? It isn’t a new issue, and is discussed in-depth on the FOAF wiki.
There are several proposals for dealing with this. The one that seemed to get the most support was to recommend the latter approach and make the first illegal. That means making hundreds of thousands of documents invalid. A second approach was to endorse current practice and change the semantics of the dc:creator
term to explictly mean the name of the creator and invent a new term (e.g. creatingEntity
) to represent the structured approach.
...
That’s when my crisis struck. I was sitting at the world’s foremost metadata conference in a room full of people who cared deeply about the quality of metadata and we were discussing scraping data from descriptions! Scraping metadata from Dublin Core! I had to go check the dictionary entry for oxymoron just in case that sentence was there! If professional cataloguers are having these kinds of problems with RDF then we are fucked.
It says to me that the looseness of the model is introducing far too much complexity as evidenced by the difficulties being experienced by the Dublin Core community and the W3C HTML working group. A simpler RDF could take a lot of this pain away and hit a sweet spot of simplicity versus expressivity.
Ian Davis isn't the only RDF head wondering whether there is too much complexity involved when trying to use RDF to get things done. Uche Ogbuji also has a post entitled Is RDF moving beyond the desperate hacker? And what of Microformats? where he writes
I've always taken a desperate hacker approach to RDF. I became a convert to the XML way of expressing documents right away, in 1997. As I started building systems that managed collections of XML documents I was missing a good, declarative means for binding such documents together. I came across RDF, and I was sold. I was never really a Semantic Web head. I used RDF more as a desperate hacker with problems in a fairly well-contained domain.
...
I've developed an overall impression of dismay at the latest RDF model semantics specs. I've always had a problem with Topic Maps because I think that they complicate things in search of an unnecessary level of ontological purity. Well, it seems to me that RDF has done the same thing. I get the feeling that in trying to achieve the ontological purity needed for the Semantic Web, it's starting to leave the desperate hacker behind. I used to be confident I could instruct people on almost all of RDF's core model in an hour. I'm no longer so confident, and the reality is that any technology that takes longer than that to encompass is doomed to failure on the Web. If they think that Web punters will be willing to make sense of the baroque thicket of lemmas (yes, "lemmas", mi amici docte) that now lie at the heart of RDF, or to get their heads around such bizarre concepts as assigning identity to literal values, they are sorely mistaken. Now I hear the argument that one does not need to know hedge automata to use RELAX NG, and all that, but I don't think it applies in the case of RDF. In RDF, the model semantics are the primary reason for coming to the party. I don't see it as an optional formalization. Maybe I'm wrong about that and it's the need to write a query language for RDF (hardly typical for the Web punter) that is causing me to gurgle in the muck. Assuming it were time for a desperate hacker such as me to move on (and I'm not necessarily saying that I am moving on), where would he go from here?
Uche is one of the few RDF heads whose opinions seem grounded in practicality (Joshua Allen is another) so it is definitely interesting to see him begin to question whether RDF is the right path.
I definitely think there is some merit to disconnecting RDF from the Semantic Web and seeing if it can hang on its own from that perspective. For example, XML as a Web document format is mostly dead-on-arrival but it has found a wide variety of uses as a general data interchange format instead. I've wondered if there is similar usefulness lurking within RDF once it loses its Semantic Web baggage.