Random Acts Of XSLT Geekery - Dare Obasanjo's weblog

April 19, 2003

@ 12:58 AM

Using the Right Tool for the Job

Kirk Allen Evans has a post on using XSLT to convert CSV to XML. Now although the stylesheet shows cool XSLT tricks it is a probably the most convoluted way to go about solving this problem and fails to utilize the primary strengths of XSLT but instead highlights its weaknesses.

XSLT is primarily designed for processing XML specifically converting XML input to XML output. It deals very poorly with text processing which is evidenced by the large complex stylesheet that is needed to perform the equivalent of the following lines of C# code

using System; using System.Xml; using System.IO; class Test{ public static void Main(string[] args){ string csv = @"1,5 main st, ,Cumming, GA, 30040,Kirk Evans 2,13 elm st, ,Anywhere, NJ, 07825,Bob Smith"; StringWriter writer = new StringWriter(); XmlTextWriter xmlWriter = new XmlTextWriter(writer); xmlWriter.Formatting = Formatting.Indented; xmlWriter.WriteStartElement("root"); foreach(string row in csv.Split(new char[]{'\n'})){ xmlWriter.WriteStartElement("row"); foreach(string elem in row.Split(new char[]{','})){ xmlWriter.WriteElementString("elem", elem.Trim()); } xmlWriter.WriteEndElement(); } xmlWriter.WriteEndElement(); Console.WriteLine(writer.ToString()); } }

Both approaches produce the same results except that the XSLT approach can be understood by a handful of people while the above approach can be understood by anyone with formal programming experience and a rudimentary grasp of XML.

Of course, no discussion on materializing CSV as XML is complete without a pointer to Chris Lovett's XmlCsvReader which allows one to read a CSV file as if it was an XML document.

#

Data Modelling With XML

Data modelling XML formats is still mostly a black art unlike relational data modelling that has the techniques like normalization that guide people as to the right way to represent their data. Christian Romney has a post on designing XML data formats which shows his heart is in the right place but whose conclusions I disagree with. Basically Christian states given a choice between

<NumberOfRooms>424</NumberOfRooms> <Rise>Mid</Rise> <MaximumOccupancy>3 (3 adults/2 children)</MaximumOccupancy>

and

<hotelInformation> <item name="Number of Rooms">424 </item> <item name="Rise">Mid </item> <item name="Maximum Occupancy">3 (3 adults/2 children) </item> </hotelInformation>

one should pick the latter. There are many reasons that the latter is a worse choice both from the perspective of human readability or programmatic interaction. However before I mention those I'd like to look at the reason Christian thinks the latter is a better format.

Christian states "The reason the second implementation is *better* is that the addition of an additional element (a new piece of information) does not require any change to the XSL stylesheet". This is true only because of the design of his stylesheet. On the other hand, if he used the following XSLT stylesheet

<xsl:template match="hotelInformation"> <table> <xsl:for-each select="*"> <tr> <td><xsl:value-of select="local-name()" /></td> <td><xsl:value-of select="." /></td> </tr> </xsl:for-each> </table> </xsl:template>

additions to the former format would not require any changes to the stylesheet. The only thing of note about my stylesheet fragment is that I assumed the former format has a root element named hotelInformation as the latter does (otherwise it wouldn't be well-formed XML).

Given that both formats are similarly resistant to change what reasons would one pick the former over the latter. Let me the count the ways; the former is less verbose and cluttered with redundant information making it easier to read, if the former format and stylesheet can use XML namespaces for extensibility with ease while the latter cannot, and finally when programmatically processing XML it is more cumbersome to take actions based on attribute names than it is if tehy are based on element names (this is especially true with SAX).

#

QNames in Content

One of the most aggravating problems that has risen from the existence of the Namespaces in XML recommendation is the use of Qualified Names (QNames) as Identifiers in Content. QNames in content have proven to be very useful for XSLT and W3C XML Schema but they cause a number of problems when processing such documents and in certain edge cases. An example of such a problem is how to emit attributes whose values are QNames into another XML document.

Gudge had that problem and found a solution. A nice bit of XSLT hackery.

#

--
Get yourself a News Aggregator and subscribe to my RSS feed

Disclaimer: The above comments do not represent the thoughts, intentions, plans or strategies of my employer. They are solely my opinion.