Jonathan Marsh On XInclude and XML Schema

May 18, 2005

@ 02:01 PM

It seems Jonathan Marsh has joined the blogosphere with his new blog Design By Committee. If you don't know Jonathan Marsh, he's been one of Microsoft's representatives at the W3C for several years and has been an editor of a variety of W3C specifications including XML:Base, XPointer Framework, and XInclude.

In his post XML Base and open content models Jonathan writes

There is a current controversy about XInclude adding xml:base attributes whenever an inclusion is done. If your schema doesn't allow those attributes to appear, you're document won't validate. This surprises some people, since the invalid attributes were added by a previous step in the processing chain (in this case XInclude), rather than by hand. As if that makes a difference to the validator!

Norm Walsh , after a false start, correctly points out this behavior was intentional. But he doesn't go the next step to say that this behavior is vital! The reason xml:base attributes are inserted is to keep references and links from breaking. If the included content has a relative URI, and the xml:base attribute is omitted, the link will no longer resolve - or worse, it will resolve to the wrong thing. Can you say "security hole"?

Sure it's inconvenient to fail validation when xml:base attributes are added, especially when there are no relative URIs in the included content (and thus the xml:base attributes are unnecessary.) But hey, if you wanted people or processes to add attributes to your content model, you should have allowed them in the schema!

I agree that the working group tried to address a valid concern. But this seems to me to be a case of the solution being worse than the problem. To handle a situation for which workarounds will exist in practice (i.e. document authors should use absolute URIs instead of relative URIs in documents) the XInclude working group handicapped using XInclude as part of the processing chain for documents that will be validated by XML Schema.

Since the problem they were trying to solve exists in instance documents, even if the document author don't follow a general guideline of favoring absolute URIs over relative URIs, these URIs can be expanded in a single pass using XSLT before being processed up the chain by XInclude. On the other hand if a schema doesn't allow xml:base elements everywhere (basically every XML format in existence) then one cannot use XInclude as part of the pipeline that creates the document if the final document will undergo schema validation.

I think the working group optimized for an edge case but ended up breaking a major scenario. Unfortunately this happens a lot more than it should in W3C specifications.

Categories: XML

« Newsgator Purchases FeedDemon | Home | Newsgroup Support in RSS Bandit »

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Jonathan Marsh On XInclude and XML Schema - Dare Obasanjo's weblog