Metadata Quality, Events Databases and Live Clipboard

April 4, 2006

@ 02:18 AM

In his post Exploring Live Clipboard Jon Udell posts a screencast he made about LiveClipboard. He writes

I've been experimenting with microformats since before they were called that, and I'm completely jazzed about Live Clipboard. In this screencast I'll walk you through examples of Live Clipboard in use, show how the hCalendar payload is wrapped, grab hCalendar data from Upcoming and Eventful, convert it to iCalendar format for insertion into a calendar program, inject it natively into Live Clipboard, and look at Upcoming and Eventful APIs side-by-side.
All this leads up to a question: How can I copy an event from one of these services and paste it into another? My conclusion is that adopting Live Clipboard and microformats will be necessary but not sufficient. We'll also need a way to agree that, for example, this venue is the same as that venue. At the end, I float an idea about how we might work toward such agreements.

The problem that Jon Udell describes is a classic problem when dealing with mapping data from different domains. I posted about this a few months ago in my post Metadata Quality and Mapping Between Domain Languages where I wrote

The problem Stefano has pointed out is that just being able to say that two items are semantically identical (i.e. an artist field in dataset A is the same as the 'band name' field in dataset B) doesn't mean you won't have to do some syntactic mapping as well (i.e. alter artist names of the form "ArtistName, The" to "The ArtistName") if you want an accurate mapping.

This is the big problem with data mapping. In Jon's example, the location is called Colonial Theater in Upcoming and Colonial Theater (New Hampshire) in Eventful. In Eventful it has a street address while in Upcoming only the street name is provided. Little differences like these are what makes data mapping a hard problem. Jon's solution is for the community to come up with global identifiers for venues as tags (e.g. Colonial_Theater_NH_03431) instead of waiting for technologists to come up with a solution. That's good advice because there really isn't a good technological solution for this problem. Even RDF/Semantic Web junkies like Danny Ayers in posts like Live clipboard and identifying things start with assumptions like every venue has a unique identifier which is it's URI. Of course this ignores the fact that coming up with a global, unique identification scheme for the Web is the problem in the first case. The problem with Jon's approach is the same one that is pointed out in almost every critique of folksonomies, people won't use the same tags for the same concept. Jon might useColonial_Theater_NH_03431 while I use Colonial_Theater_95_Maine_Street_NH_03431 which leaves us with the same problem of inconsistent identifiers being used for the same venue.

I assume that for the near future we continue seeing custom code being written to make data integration across domains work. Unfortunately, no developments on the horizon look promising in making this problem go away.

PS: Ray Ozzie has a post on some of the recent developments in the world of Live Clipboard in his post Wiring Progress, check it out.

Categories: Technology | Web Development

Tracked by:
"Mapping Data Between Domains : Are We Trying Too Hard, And Simply Overlooking T... [Trackback]
"Shared Hosting Provider" (Shared Hosting Provider) [Trackback]

« Photo E-mail and Windows Live Mail Deskt... | Home | Greg Linden on SQL Databases and Interne... »

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Metadata Quality, Events Databases and Live Clipboard - Dare Obasanjo's weblog