Tim Berners-Lee on the Failure of URIs as Identifiers in RDF

October 27, 2004

@ 03:30 PM

Almost two years ago I wrote a blog entry entitled Useful vs. Useless Abstractions which stated that the invention of URIs by the IETF/W3C crowd to replace the combination of URLs and URNs was a step backwards. I wrote

URIs are a merger of the syntax of URLs and URNs which seem to have been repurposed from their original task of identifying and locating network retrievable documents to being more readable versions UUIDs which can be used to identify any person, place or thing regardless of whether it is a file on the Internet or a feeling in your heart.

This addition to the URN/URL abstraction seemed to address some of the bits which may have been considered to be leaky (if I enter http://www.yahoo.com in my browser and it loads it from its cache then the URL isn't acting as a location but as an identifier). Others also saw URIs as a way for people who needed user friendly UUIDs for use on the Web. I've so far come into contact with URIs in two aspects of my professional experience and they have both left a bad taste in my mouth.

URIs and the Semantic Web: Ambiguity²

One problem with URIs is that they don't uniquely identify a single thing. Consider the following hyperlinked statements

Dare is a Georgia Tech alumni.

Dare's website is valid XHTML.

In the above statements I use the URI "http://www.25hoursaday.com" to identify both myself and my web page. This is a bad thing for the Semantic Web. If you read Aaron Swartz's excellent primer on the Semantic Web you will notice where he talks about RDF and its dependence on URIs
...
Now consider...
<http://aaronsw.com/> <http://love.example.org/terms/reallyLikes> <http://www.25hoursaday.com/> .

Can you tell whether Aaron really like my website or me personally from the above RDF statement? Neither can I. This inherrent ambiguity is yet another issue with the vision of the Semantic Web and the current crop of Semantic Web technologies that are overly dependent on URIs.

Over the past few years I've been on the W3C Technical Architecture Group mailing list I've seen this inherent ambiguity of URIs result in many lengthy, seemingly never-ending discussions about how to workaround this problem or whether it is even a problem in itself. The discussion thread entitled Information resources? which morphed into referendum on httpRange-14 is the latest incarnation of this permathread on the WWW-TAG mailing list.

I was much heartened to see that Tim Berners-Lee is beginning to see some of the problems caused by the inherent ambiguity of URIs. In his most recent response to the "referendum on httpRange-14 " thread he writes

> It is a best practice that there be some degree of consistency
> in the representations provided via a given URI.

Absolutely.

> That applies *both* when a URI identifies a picture of
> a dog *and* when a URI identifies the dog itself.
>
> *All* URIs which offer consistent, predictable representations will be
> *equally* beneficial to users, no matter what they identify.

Now here seems to be the crunch.
The web architecture relies, we agree I think, on this consistency
or predictability of representations of a given URI.

The use of the URI in the web is precisely that it is associated
with that class of representations which could be returned for it.

Because the "class of representations which could be returned"
is a rather clumsy notion, we define a conceptual thing
which is related to any valid representation associated with the URI,
and as the essential property of the class is a similarity in
information content, we call the thing an Information Resource.

So a URI is a string whose sole use in the web architecture
is to denote that information resource.

Now if you say in the semantic web architecture that the same will
identify a dog, you have a conflict.

>
>> The current web relies on people getting the same information from
>> reuse of the same URI.
>
> I agree. And there is a best practice to reinforce and promote this.
>
> And nothing pertaining to the practice that I and others employ, by
> using http: URIs to identify non-information resources, in any way
> conflicts with that.

Well, it does if the semantic web can talk about the web, as the
semantic web can't be ambiguous about what an identifier identifies in the way that
one can in english.

I want my agent to be able to access a web page, and then use the URI
to refer to the information resource without having to go and find some
RDF somewhere to tell it whether in fact it would be mistaken.

I want to be able to model lots and lots of uses of URIs in existing
technology in RDF. This means importing them wholesale,
it needs the ability to use a URI as a URI for the web page without
asking anyone else.

The saga continues. The ambiguity of URIs have also been a problem in XML namespaces since users of XML often wonder assume a namespace URI should lead to a network retrievable document when accessed. Since they are URIs, this isn't necessarily true. If they were URLs it would be and if they were URNs they would not be.

Categories: Technology

Tracked by:
http://practicalrdf.info/archives/2004/10/faulty-uris/ [Pingback]

« Don Box's WS-Why Talk and the WS Kernel | Home | RSS Bandit Switcher Testimonials »

Wednesday, 27 October 2004 17:04:53 (GMT Daylight Time, UTC+01:00)

You nailed it again!
http://www.kbcafe.com/rss/?guid=20041027090343

Randy Charles Morin

Thursday, 28 October 2004 11:57:31 (GMT Daylight Time, UTC+01:00)

I have a discussion of this subject here:

http://www.edavies.nildram.co.uk/2003/08/08/uri-comments/

I'm far from happy about the last few paragraphs - there is at least one simple syntax error and the overall idea is not good.

However, I stick with the principle that there should be a clear distinction in web protocols between retrieving a resource and retrieving information about a resource. Having such a distinction would go some way to clearing up the confusion between a URI for a person and a URI for that person's web site and equally between the URI for a namespace and the URI for information about a namespace.

Ed Davies

Thursday, 28 October 2004 15:24:05 (GMT Daylight Time, UTC+01:00)

Dare, the ambiguities of using a website URI to identify a person are well known, which is why FOAF uses a different technique (a set of properties unique to that person). See:
http://rdfweb.org/mt/foaflog/archives/000039.html

This is a different issue than the difference between URIs and URLs. You could say that http://www.25hoursaday.com/Dare identifies you.

It's also worth noting that the practical impact of the TAG debate is virtually nil...

Danny

Thursday, 28 October 2004 17:29:41 (GMT Daylight Time, UTC+01:00)

"The ambiguity of URIs have also been a problem in XML namespaces since users of XML often wonder assume a namespace URI should lead to a network retrievable document when accessed."

One acronym 'rddl', not that it's really relevant to the argument against URI's to begin with.

rddl

Thursday, 28 October 2004 18:55:47 (GMT Daylight Time, UTC+01:00)

Danny,
You seem to miss the forest for the trees. The point is that if you use a URI that points to a network retrievable resource to identify some abstract concept then it is ambiguous when statements about that URI are about the bits dispensed from that URI and when they are about the abstract concept. The fact that FOAF has some hacks around this is interesting but it just goes to show that RDF is broken as designed. It seems that Tim Berners-Lee is finally realizing that.

You are right that the TAG discussion won't have much practical impact. None of the TAG discussions ever do. That just means RDF is probably going to stay broken.

kpako@yahoo.com (Dare Obasanjo)

Thursday, 28 October 2004 21:56:17 (GMT Daylight Time, UTC+01:00)

What's great is that some XML+namespaces URI's are ALSO URLs.

Go to this web address:
http://www.w3.org/1999/XSL/Transform

This is the URI for XSLT but if you navigate to that address as a URL you will actually get a document telling you that it is the namespace of XSLT.

Let's see a chunk of software figure what a user wants when they use that URI as a reference to XSLT the technology or the document located at the URL!

Don Kackman

Thursday, 28 October 2004 23:51:22 (GMT Daylight Time, UTC+01:00)

"The point is that if you use a URI that points to a network retrievable resource to identify some abstract concept then it is ambiguous when statements about that URI are about the bits dispensed from that URI and when they are about the abstract concept."

Dare,

As far as Web and SemWeb architectures go, statements about that URI are about what that URI denotes not the served representations (in RDF terms, those are literals and you can't make statements about them). That leaves us with the notion of a document or page vesus everything that isn't to get confused about. That ambiguity however, is not a modelling problem induced by either RDF *or* URIs. It's induced by use of language. Anyone arguing that such ambiguity can/should be eliminated via formal languages (RDF) or tokens (URIs) is in error - they might as well be designing zero latency networks or perpetual motion machines.

"You are right that the TAG discussion won't have much practical impact. None of the TAG discussions ever do. That just means RDF is probably going to stay broken."

Yes, there is an issue here, but no it is not caused by URIs or RDF. The ambiguity you're talking about is a feature of any language that allows denotation or pointing or proper names. A large chunk of 20th century logic and philosophy was given over to sorting out the wood from the trees on such matters. Some of the people who worked on RDF during 2000-3 are well acquainted with that body of work and its consequences for computerized data. Today, most of the logical bunk in RDF has been elided, reification being a notable exception.

As for URIs, my problem with them is that too many people are trying to have their cake and eat it with them, which causes confusion and an amount of waffling and muddled thinking. What's broken are people's expectations of what can be expressed.

Bill de hÓra

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Tim Berners-Lee on the Failure of URIs as Identifiers in RDF - Dare Obasanjo's weblog