In a recent post entitled XML For You and Me, Your Mama and Your Cousin Too I wrote
The main problem is that there are a number of websites which have the same information but do not provide a uniform way to access this information and when access mechanisms to information are provided do not allow ad-hoc queries. So the first thing that is needed is a shared view (or schema) of what this information looks like which is the shared information model Adam talks about...
Once an XML representation of the relevant information users are interested has been designed (i.e. the XML schema for books, reviews and wishlists that could be exposed by sites like Amazon or Barnes & Nobles) the next technical problem to be solved is uniform access mechanisms... Then there's deployment, adoption and evangelism...
We still need a way to process the data exposed by these web services in arbitrary ways. How does one express a query such as "Find all the CDs released between 1990 and 1999 that Dare Obasanjo rated higher than 3 stars"?..
At this point if you are like me you might suspect that defining that the web service endpoints return the results of performing canned queries which can then be post processed by the client may be more practical then expecting to be able to ship arbitrary SQL/XML, XQuery or XPath queries to web service end points.
The main problem with what I've described is that it takes a lot of effort. Coming up with standardized schema(s) and distributed computing architecture for a particular industry then driving adoption is hard even when there's lots of cooperation let alone in highly competitive markets.
A few days ago I got a response to this post from Michael Brundage, author of XQuery : The XML Query Language and a lead developer of the XML<->relational database technologies the WebData XML team at Microsoft produces, on a possible solution to this problem that doesn't require lots of disparate parties to agree on schemas, data model or web service endpoints. Michael wrote
Dare, there's already a solution to this (which Adam created at MS five years ago) -- virtual XML views to unify different data sources. So Amazon and BN and every other bookseller comes up with their own XML format. Somebody else comes along and creates a universal "bookstore" schema and maps each of them to it using an XML view. No loss of performance in smart XML Query implementations.
And if that universal schema becomes widely adopted, then eventually all the booksellers adopt it and the virtual XML views can go away. I think eventually you'll get this for documents, where instead of translating WordML to XHTML (as Don is doing), you create a universal document schema and map both WordML and XHTML into it. (And if the mappings are reversible, then you get your translators for free.)
This is basically putting an XML Web Service front end that supports some degree of XML query on aggegator sites such as AddALL or MySimon. I agree with Michael that this would be a more bootstrapable approach to the problem than trying to get a large number of sites to support a unified data model, query interface and web service architecture.
Come to think of it we're already halfway there to creating something similar for querying information in RSS feeds thanks to sites such as Feedster and Technorati. All that is left is for either site or others like them to provide richer APIs for querying and one would have the equivalent of an XML View of the blogosphere (God, that is such a pretensious word) which you could query to your heart's delight.
Interesting...