Charles Cook has a blog posting on XML and Performance where he writes
XML-based Web Services look great in theory but I had one nagging thought last week while on the WSA course: what about performance? From my experience with VoiceXML over the last year it is obvious that processing XML can soak up a lot of CPU and I was therefore interested to see this blog post by Jon Udell in which he describes how Groove had problems with XML:
Sayonara, top-to-bottom XML I don't believe that I pay a performance penalty for using XML, and depending on how you use XML, you may not believe that you do either. But don't tell that to Jack Ozzie. The original architectural pillars of Groove were COM, for software extensibility, and XML, for data extensibility. In V3 the internal XML datastore switches over to a binary record-oriented database.
You can't argue with results: after beating his brains out for a couple of years, Jack can finally point to a noticeable speedup in an app that has historically struggled even on modern hardware. The downside? Debugging. It was great to be able to look at an internal Groove transaction and simply be able to read it, Jack says, and now he can't. Hey, you've got to break some eggs to make an omelette.
Is a binary representation of the XML Infoset a useful way of improving performance when handling XML? Would it make a big enough difference?
For the specific case of Groove I'd be surprised if they used a binary representation of the XML infoset as opposed to a binary representation of their application object model. Lots of applications that utilize XML for data storage or configuration data immediately populate this data into application objects. This is a layer of unnecessary processing since one could skip the XML reading and writing step and directly read and write serialized binary objects. If performance is that important to your application and there are no interoperability requirements it is a better choice to serialize binary objects instead of going through the overhead of XML serialization/deserialization. The main benefit of using XML in such scenarios is that in many cases there is existing infrastructure for working with XML such as parsers, XML serialization toolkits and configuration handlers. If your performance requirements are so high that the overhead of going from XML to application objects is too high then getting rid of the step in the middle is a wise decision. Although as pointed out by Jon Udell you loose the ease of debugging that comes with using a text based format.
If you are considering using XML in your applications always take the XML Litmus Test