XML Specs That Give You Nightmares - Dare Obasanjo's weblog

November 4, 2004

@ 06:10 PM

Many times when implementing XML specifications I've come across I've come up against feature that just seem infeasible or impractical to implement. However none of them have given me nightmares as they have my friend Mike Vernal, a program manager on the Indigo team at Microsoft. In his post could you stop the noise, i'm trying to get some rest ... he talks about spending nights tossing and turning having nightmares about how the SOAP mustUnderstand header attribute should be processed. In Mike's post More SOAP Sleepness he mentions having sleepless nights worrying about the behavior of SOAP intermediaries as described in Section 2.7: Relaying SOAP Messages.

This isn't to say I didn't have sleepless nights over implementing XML specifications when I worked on the XML team at Microsoft. One of the issues that consumed a lot more of my time than is reasonable is explained in Derek Denny-Brown's post Loving and Hating XML Namespaces

Namespaces and your XML store
For example, load this document into your favorite XML store API (DOM/XmlBeans/etc)
<book title='Loving and Hating XML Namespaces'>
   <author>Derek Denny-Brown</author>
</book>
Then add the attribute named "xmlns" with value "http://book" to the <book> element. What should happen? Should that change the namespaces of the <book> and <author> elements? Then what happens if someone adds the element <ISBN> (with no namespace) under <book>? Should the new element automatically acquire the namespace "http://book", or should the fact that you added it with no namespace mean that it preserves it's association with the empty namespace?

In MSXML, we tried to completely disallow editing of namespace declarations, and mostly succeeded. There was one case, which I missed, and we have never been able to fix it because so many people found it an exploited it. The W3C's XML DOM spec basically says that element/attribute namespaces are assigned when the nodes are created, and never change, but is not clear about what happens when a namespace declaration is edited.

Then there is the problem of edits that introduce elements in a namespace that does not have an existing namespace declaration:
<a xmlns:p="http://p/">
  <b>
    ...
      <c p:x="foo"/>
    ...
  </b>
</a>
If you add attribute "p:z" in namespace "bar" to element <b>, what should happen to the p:x attribute on <c>? Should the implementations scan the entire content of <b> just in case there is a reference to prefix "p"?

Or what about conflicts? Add attribute "p:z" in namespace "bar" to the below sample... what should happen?
<a xmlns:p="http://p/" p:a="foo"/>

This problem really annoyed me while I was the PM for the System.Xml.XmlDocument class and the short-lived System.Xml.XPath.XPathDocument2. In the former, I found out that once you started adding, modifying and deleting namespace declarations the results would most likely be counter-intuitive and just plain wrong. Of course, the original W3C DOM spec existed before XML namespaces and trying to merge them in after the fact was probably a bad idea. With the latter class, it seemed the best we could do was try and prevent editing namespace nodes as much as possible. This is the track we decided to follow with the newly editable System.Xml.XPath.XPathNavigator class in the .NET Framework.

This isn't the most sleep depriving issue I had to deal with when trying to reflect the decisions in various XML specifications in .NET Framework APIs. Unsurprisingly, the spec that caused the most debate amongst our team when trying to figure out how to implement its features over an XML store was the W3C XML Schema Recommendation part 1: Structures. The specific area was the section on contributions to the Post Schema Validation Infoset and the specific infoset contribution which caused so much consternation was the validity property.

After schema validation an XML element or attribute should have additional metadata added to it related to validation such as it's type, its default value specified in the schema if any and whether it is valid or not according to its type. Although the validity property is trivial to implement on a read-only API such as the System.Xml.XmlReader class, it was unclear what would be the right way to expose this in other representations of XML data such as the System.Xml.XmlDocument class. The basic problem is "What happens to the validity propety of the element or attribute those of all its ancestors once the node is updated?". Once I change the value of an age element which is typed as an integer from 17 to seventeen what should happen. Should the DOM test every edit to make sure it is valid for that type then reject it otherwise? Should the edit be allowed but the validity property of the element and all its ancestors be changed? What if there is a name element with required first and last elements and the user wants to delete the first element and replace it with a different one? How would that be reflected with regards to the validity property of the name element?

None of the answers to the question we came up with satisfactory. In the end, we were stuck between a rock and a hard place so we made the compromise choice. I believe we debated this issue every other month for about a year.

Categories: XML

« Vote Or Die Redux | Home | Why Comments in WordPress Blogs Don't Sh... »

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for XML Specs That Give You Nightmares - Dare Obasanjo's weblog