June 15, 2004
@ 04:17 PM

The ongoing conversation between Jeremy Mazner and Jon Udell about the capabilities of WinFS deepen this morning with Jeremy's post Did I misunderstand Udell's argument against WinFS? which was followed up by Jon's post When a journalist blogs. In his post Jon asks

We have standard query languages (XPath, XQuery), and standard ways of writing schemas (XSD, Relax), and applications (Office 2003) that with herculean effort have been adapted to work with these query and schema languages, and free-text search further enhancing all this goodness. Strategically, why not build directly on top of these foundations?

Tactically, why do I want to write code like this:

public class Person
  {
  [XmlAttribute()] public string Title;
  [XmlAttribute()] public string FirstName;
  [XmlAttribute()] public string MiddleName;
  [XmlAttribute()] public string LastName;
  ....

in order to consume data like this?

<People>
  <Person
    DisplayName="Woodgrove Bank"
    IMAddress="Support@woodgrovebank.com"
    UserTile=".\user_tiles\Adventure Works.jpg">
    <EmailAddresses>
        <EmailAddress
            Type="Work"
            Address="mortgage@woodgrovebank.com"/>
        <EmailAddress
            Type="Primary"
            Address="Support@woodgrovebank.com"/>
   </EmailAddresses>

I believe two things to be true. First, we have some great XML-oriented data management technologies. Second, the ambitious goals of WinFS cannot be met solely with those technologies. I'm trying to spell out where the line is being drawn between interop and functionality, and why, and what that will mean for users, developers, and enterprises.

Jon asks several questions and I'll try to answer all the ones I can. The first question about why WinFS doesn't build on XML, XQuery and XSD instead of items, OPath and the WinFS schema language is something that the WinFS folks will have to answer. Of course, Jon could also ask why it doesn't build on RDF, RDQL [or any of the other RDF query languages] and RDF Schema which is a related question that naturally follows from the answer to Jon's question.

The second question is why would one want to program against a Person object when they have a <Person> element. This is question has an easy answer which unfortunately doesn't sit well with me. The fact is that developers prefer programming against objects than they do programming with XML APIs. No XML API in the .NET Framework (XmlReader, XPathNavigator, XmlDocument, etc) comes close to the ease of use of  programming against strongly typed objects in the general case. Addressing this failing [and it is a failing] is directly my responsibility since I'm responsible for core XML APIs in the .NET Framework. Coincidentally, we just had a review with our new general manager yesterday and this same issue came up and he asked what we plan to do about this in future releases. I have some ideas. The main problem with using objects to program against XML is that although objects work well for programming against data-centric XML (rigidly structured tabular data such as an the data in an Excel spreadsheet, a database dump or serialized objects) there is a signficant impedance mismatch when trying to use strongly typed objects to program against document-centric XML (semi-structured data such as a Word document). However the primary scenarios the WinFS folks want to tackle are about rigidly structured data which works fine with using objects as the primary programming model.

Jon says that he is trying to draw the line between interop and functionality. I'm curious as to what he means by interop in this case. The fact that WinFS is based on items, OPath and WinFS schema doesn't mean that WinFS data cannot be exchanged in an interoperable manner (e.g. some form of XML export and import) nor does it mean that non-Microsoft applications cannot interact with WinFS. I should clarify that I have no idea what the WinFS folks consider their primary interop scenarios but I don't think the way WinFS is designed today means it cannot interoperate with other platforms or data models.

I suspect that Jon doesn't really mean interop when says so. I believe he is using the word the same way Java people use it where it really means 'One Language, One Programming Model, One Platform' everywhere instead of being able to communicate between disparate end points. In this case the language is XML and the platform is the XML family of technologies.