There are several
different data models for XML even within the W3C.
Each of these data models for XML have different
ideas has different ideas of what constitutes a
node or more generally a significant item in an XML
document. The XPath
1.0 data model has 7 nodes (root, element,
attribute, namespace, text, comment and processing
instruction) which is similar to the type and
number of nodes in the XQuery
data model except that the root node is renamed
to the document node to more accurately reflect the
fact that it represents the entire XML
document.
On the other hand the W3C Document Object Model
has
12 node types (document, element, attribute,
text, comment, processing instruction, CDATA
section, entity, entity reference, doctype,
notation, and document fragment)
What tends to cause confusion is when one mixes
data models as is the case of performing XPath over
the DOM. In such cases discrepencies in the data
models may cause problems or lead to some
confusion. The following example illustrates such a
point of confusion
using System;
using System.Xml;
class Test{
public static void Main(string[] args){
XmlDocument doc = new
XmlDocument();
doc.LoadXml("<root>Sam
<![CDATA[ I ]]> Am</root>");
Console.WriteLine(doc.SelectNodes("/root/child::node()").Count);
foreach(XmlNode xn in
doc.SelectNodes("/root/child::node()")){
Console.WriteLine(xn.OuterXml);
}
}
}
Now the question is what should the output of
the program be?- 3
Sam <![CDATA[ I ]]> Am
- 1
Sam I Am
- 1
Sam
Contrary to most expectations the answer is
C.
From a DOM perspective the answer A seems obvious
because the
root
element does have
three DOM nodes as children; a text node containing
the string "Sam", the CDATA section and another
text node containing the string "Am". The problem
with A is that the XPath data model does not have
CDATA sections so a XmlCDataSection instance cannot
be returned by an XPath query.
B seems like the logical answer because the XPath
data model explicitly states that CDATA sections
are removed and adjacent text nodes are merged. The
problem with B is that the original document did
not contain a text node containing the string "Sam
I Am" so this means the XPath query would have to
create a new node. Even worse one wonders what
happens when an attempt is made to access the
ParentNode property of the returned Xmlnode object.
Should it point at the original
root
element in the DOM even though the newly created
node is technically not one of its child
nodes.
C is the compromise answer. It returns something
that makes sense to the XPath data model (a text
node) but acts only as selection of a child node of
the
root
element without creating a
brand new DOM node whose parentage is
questionable.
I love my job. :)