Interfaces vs. Abstract
Classes
When I first started working with the .NET
Framework I was surprised at the number of abstract
base classes that I came across. From a background
working with Java I had come to expect that one
wants to indicate that a class can be expected to
have certain behavior one uses an interface. When I
first confronted some folks at work about this they
pointed out that "abstract classes version better
than interfaces". This sounded like Greek to me.
All I knew was that a design based on abstract base
classes meant that classes that wanted to expose
the behavior of some of the .NET Framework classes
had to inherit from some abstract base class (thus
utilizing the single chance of inheritence they
have since multiple inheritence is not supported in
the CLR) instead of simply implementing an
interface.
I constantly looked at the fact that
org.w3c.dom.Node was an interface in Java
implementations while
System.Xml.XmlNode was an abstract base class a
deficiency of our model. This meant that in Java if
one wanted to make their object look like a DOM
node all they had to do was implement an interface
while in the .NET Framework one had to derive from
an abstract class does choosing to become part of
the DOM class hierarchy. This was especially
annoying for classes that had to be part of some
application specific class hierarchy.
Over time as I've started taking over design
ownership of some .NET Framework XML APIs the quote
"abstract classes version better than interfaces"
has begun to ring true. The problem with interfaces
is that once you ship them then it isn't possible
to change them in future versions without breaking
implementers while the same is not the case with
abstract base classes.
Consider the following contrived example. Imagine
an API that ships with an IEmployee
interface that contains three methods
getFirstName(), getLastName() and getSalary() which
is implemented by the Manager and RegularEmployee
classes in two seperate applications. Now imagine
that in the next version of the API the designers
realize that they made an oversight and there is a
need for a getFullName() method in the
IEmployee
interface because many
applications tend to require this information.
However there is no way to add this method to the
next version of the API without breaking third
parties that have implemented the
IEmployee
interface because they will
not have the getFullName() method or in the worst
case have implemented it with different semantics.
On the other hand if IEmployee
was an
abstract base class then the getFullName() could be
added in a future version and as long as a virtual
implementation was provided, implementers of the
original version of IEmployee
would
not be broken.
This may not seem like a big deal to people who
typically develop applications instead of libraries
but to a library designer this is actually an
important concern.
Going back to my original examples involving
org.w3c.dom.Node and
System.Xml.XmlNode I now realize that this is
not as much of a problem as I originally thought.
An application object that wanted to provide an XML
view of itself by looking like a DOM node could use
composition and have a member or property of the
class actually derive from System.Xml.XmlNode and
use that. This actually leads to better factoring
because the application object isn't cluttered with
DOM specific code as well as its application
specific methods and properties.
#
XML Stream
Processing: Tim Bray Needs to Get Out
More
For the past few years the mantra for processing
XML has been "if you want random access to XML then
you should store it all in memory and manipulate it
via the DOM and if you want efficient forward only
(i.e. streaming) access to an XML document then you
use SAX". The problems with this picture is that
SAX is non-idiomatic and awkward to a large number
of programmers including the illustrious Tim Bray
and many of the users of Microsoft XML
technologies. This is a problem that many designers
of XML APIs have realized which has brought upon
the advent of
Pull-based XML parsers as opposed to the push
based model of SAX.
In Tim Bray's article he describes the following
code sample as a Nirvana of sorts when it comes to
processing XML streams
while (<STDIN>) {
next if (X<meta>X);
if
(X<h1>|<h2>|<h3>|<h4>X)
{ $divert = 'head'; }
elsif (X<img
src="/^(.*\.jpg)$/i>X)
{ &proc_jpeg($1); }
# and so on...
}
The interesting thing is that processing XML
in this model can be written using C# and the
System.Xml.XmlReader
class in the .NET
Framework for with ease. Below is a code fragment
that is roughly equivalent to his dream processing
model.
/* Assumes an XmlReader called reader already
opened over an XML data source */
while (reader.Read()){
if(reader.NodeType.Equals(XmlNodeType.Element)
&&
reader.Name.Equals("meta")){
continue;
}
if((reader.Name.Equals("h1")
|| reader.Name.Equals("h2")
||
reader.Name.Equals("h3") ||
reader.Name.Equals("h4")) &&
reader.NodeType.Equals(XmlNodeType.Element)){
divert = "head";
}else if
(reader.NodeType.Equals(XmlNodeType.Element)
&& reader.Name.Equals("img")){
string jpegurl =
reader.GetAttribute("src);
if((jpegurl != null)
&& jpegurl.EndsWith(".jpg")){
ProcessJpeg(jpegurl);
}
}
}
I believe the above idioms should also be possible
in Java based XML pull parsers but can't confirm
this. It's quite amusing to me that a post which is
really Tim Bray complaining about the crappy APIs
for processing XML he is used to got so much
traction on
Slashdot and
Jon Udell's blog as an indictment of XML.
The posting by Tim Bray was really an announcement
that he is disconnected from the current landscape
of XML technologies. I also liked the following
quoteThe notion that there
is an "XML data model" is silly and unsupported
by real-world evidence. The definition of XML
is syntactic: the "Infoset" is an afterthought
and in any case is far indeed from being a data
model specification that a programmer could work
with. Empirical evidence: I can point to a
handful of different popular XML-in-Java APIs
each of which has its own data model and each
of which works. So why would you think that
there's a data model there to build a language
around?
I'm stunned by how quickly he contradicted
himself. However I am more stunned by the fact that
he thinks that the users of XPath and XSLT (i.e.
XPath
1.0 data model ), the W3C DOM (aka the
XML Document Object Model) or future users of
XQuery, XPath 2.0 and XSLT 2.0 (aka
XQuery
and XPath 2.0 data model) don't count as
"real-world evidence". There also those like who
consider the XML Infoset and the Post Schema
Validation Infoset(PSVI) as XML data models which
then ropes in users of W3C XML Schema as not being
able to count as "real-world evidence". I am quite
curious about what XML world he lives in where all
these users of XML technologies don't count as real
world evidence.
It's always comforting to know that the people who
are writing the specs that will affect thousands of
developers and millions of users are so in touch
with their user base and aware of the goings on in
their industry.
#
--
Get yourself a
News Aggregator and subscribe to my
RSSfeedDisclaimer:
The above comments do not
represent the thoughts, intentions, plans or
strategies of my employer. They are solely my
opinion.