Interfaces vs. Abstract Classes and Processing Streaming XML

March 19, 2003

@ 11:58 PM

Interfaces vs. Abstract Classes

When I first started working with the .NET Framework I was surprised at the number of abstract base classes that I came across. From a background working with Java I had come to expect that one wants to indicate that a class can be expected to have certain behavior one uses an interface. When I first confronted some folks at work about this they pointed out that "abstract classes version better than interfaces". This sounded like Greek to me. All I knew was that a design based on abstract base classes meant that classes that wanted to expose the behavior of some of the .NET Framework classes had to inherit from some abstract base class (thus utilizing the single chance of inheritence they have since multiple inheritence is not supported in the CLR) instead of simply implementing an interface.

I constantly looked at the fact that org.w3c.dom.Node was an interface in Java implementations while System.Xml.XmlNode was an abstract base class a deficiency of our model. This meant that in Java if one wanted to make their object look like a DOM node all they had to do was implement an interface while in the .NET Framework one had to derive from an abstract class does choosing to become part of the DOM class hierarchy. This was especially annoying for classes that had to be part of some application specific class hierarchy.

Over time as I've started taking over design ownership of some .NET Framework XML APIs the quote "abstract classes version better than interfaces" has begun to ring true. The problem with interfaces is that once you ship them then it isn't possible to change them in future versions without breaking implementers while the same is not the case with abstract base classes.

Consider the following contrived example. Imagine an API that ships with an IEmployee interface that contains three methods getFirstName(), getLastName() and getSalary() which is implemented by the Manager and RegularEmployee classes in two seperate applications. Now imagine that in the next version of the API the designers realize that they made an oversight and there is a need for a getFullName() method in the IEmployee interface because many applications tend to require this information. However there is no way to add this method to the next version of the API without breaking third parties that have implemented the IEmployee interface because they will not have the getFullName() method or in the worst case have implemented it with different semantics. On the other hand if IEmployee was an abstract base class then the getFullName() could be added in a future version and as long as a virtual implementation was provided, implementers of the original version of IEmployee would not be broken.

This may not seem like a big deal to people who typically develop applications instead of libraries but to a library designer this is actually an important concern.

Going back to my original examples involving org.w3c.dom.Node and System.Xml.XmlNode I now realize that this is not as much of a problem as I originally thought. An application object that wanted to provide an XML view of itself by looking like a DOM node could use composition and have a member or property of the class actually derive from System.Xml.XmlNode and use that. This actually leads to better factoring because the application object isn't cluttered with DOM specific code as well as its application specific methods and properties.

#

XML Stream Processing: Tim Bray Needs to Get Out More

For the past few years the mantra for processing XML has been "if you want random access to XML then you should store it all in memory and manipulate it via the DOM and if you want efficient forward only (i.e. streaming) access to an XML document then you use SAX". The problems with this picture is that SAX is non-idiomatic and awkward to a large number of programmers including the illustrious Tim Bray and many of the users of Microsoft XML technologies. This is a problem that many designers of XML APIs have realized which has brought upon the advent of Pull-based XML parsers as opposed to the push based model of SAX.

In Tim Bray's article he describes the following code sample as a Nirvana of sorts when it comes to processing XML streams

while (<STDIN>) { next if (X<meta>X); if (X<h1>|<h2>|<h3>|<h4>X) { $divert = 'head'; } elsif (X<img src="/^(.*\.jpg)$/i>X) { &proc_jpeg($1); } # and so on... } The interesting thing is that processing XML in this model can be written using C# and the System.Xml.XmlReader class in the .NET Framework for with ease. Below is a code fragment that is roughly equivalent to his dream processing model.
/* Assumes an XmlReader called reader already opened over an XML data source */ while (reader.Read()){ if(reader.NodeType.Equals(XmlNodeType.Element) && reader.Name.Equals("meta")){ continue; } if((reader.Name.Equals("h1") || reader.Name.Equals("h2") || reader.Name.Equals("h3") || reader.Name.Equals("h4")) && reader.NodeType.Equals(XmlNodeType.Element)){ divert = "head"; }else if (reader.NodeType.Equals(XmlNodeType.Element) && reader.Name.Equals("img")){ string jpegurl = reader.GetAttribute("src); if((jpegurl != null) && jpegurl.EndsWith(".jpg")){ ProcessJpeg(jpegurl); } } }

I believe the above idioms should also be possible in Java based XML pull parsers but can't confirm this. It's quite amusing to me that a post which is really Tim Bray complaining about the crappy APIs for processing XML he is used to got so much traction on Slashdot and Jon Udell's blog as an indictment of XML.

The posting by Tim Bray was really an announcement that he is disconnected from the current landscape of XML technologies. I also liked the following quote

The notion that there is an "XML data model" is silly and unsupported by real-world evidence. The definition of XML is syntactic: the "Infoset" is an afterthought and in any case is far indeed from being a data model specification that a programmer could work with. Empirical evidence: I can point to a handful of different popular XML-in-Java APIs each of which has its own data model and each of which works. So why would you think that there's a data model there to build a language around?

I'm stunned by how quickly he contradicted himself. However I am more stunned by the fact that he thinks that the users of XPath and XSLT (i.e. XPath 1.0 data model ), the W3C DOM (aka the XML Document Object Model) or future users of XQuery, XPath 2.0 and XSLT 2.0 (aka XQuery and XPath 2.0 data model) don't count as "real-world evidence". There also those like who consider the XML Infoset and the Post Schema Validation Infoset(PSVI) as XML data models which then ropes in users of W3C XML Schema as not being able to count as "real-world evidence". I am quite curious about what XML world he lives in where all these users of XML technologies don't count as real world evidence.

It's always comforting to know that the people who are writing the specs that will affect thousands of developers and millions of users are so in touch with their user base and aware of the goings on in their industry.

#

--
Get yourself a News Aggregator and subscribe to my RSS feed

Disclaimer: The above comments do not represent the thoughts, intentions, plans or strategies of my employer. They are solely my opinion.

Categories:

« Hell Yeah | Home | Get Busy »

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Interfaces vs. Abstract Classes and Processing Streaming XML - Dare Obasanjo's weblog