I've been spending some time over the past couple of months thinking about Web services and Web APIs. Questions like when a web site should decide to expose an API, what form the API should take and what technologies/protocols should be used are topics I've rehashed quite a lot in my head. Recently I came to the conclusion that if one was going to provide a Web service that is intended to be consumed by as many applications as possible, then one should consider exposing the API using multiple protocols. I felt that at least two protocols should be chosen SOAP over HTTP (for the J2EE/.NET crowd) and Plain Old XML (POX) over HTTP (for the Web developer crowd).
However, I've recently started spending a bunch of time writing Javascript code for various Windows Live gadgets and I've begun to appreciate the simplicity of using JSON over parsing XML by hand in my gadgets. I've heard similar comments echoed by co-workers such as Matt who's been spending a bunch of time writing Javascript code for Live Clipboard and Yaron Goland who's one of the minds working on the Windows Live developer platform. JSON has similar goals to XML-RPC and W3C XML schema in that it provides a platform agnostic way to transfer data which is encoded as structured types consisting of name<->value pairs and collections of name<->value pairs. It differs from XML-RPC by not getting involved with defining a mechanism for remote procedure calls and from W3C XML schema by being small, simple and focused.
Once you start using JSON in your AJAX apps, it gets pretty addictive and it begins to seem like a hassle to parse XML even when it's just plain old XML such as RSS feeds not complex crud like SOAP packets. However being an XML geek, there are a couple of things I miss from XML that I'd like to see in JSON especially if it's usage will grow to become as widespread as XML is on the Web today. Yaron Goland feels the same way and has started a series of blog posts on the topic.
In his blog post entitled Adding Namespaces to JSON Yaron Goland writes
The Problem
If two groups both create a name "firstName"
and each gives it a different syntax and semantics how is someone
handed a JSON document supposed to know which group's syntax/semantics
to apply? In some cases there might be enough context (e.g. the data
was retrieved from one of the group's servers) to disambiguate the
situation but it is increasingly common for distributed services to be
created where the original source of some piece of information can
trivially be lost somewhere down the processing chain. It therefore
would be extremely useful for JSON documents to be 'self describing' in
the sense that one can look at any name in a JSON document in isolation
and have some reasonable hope of determining if that particular name
represents the syntax and semantics one is expecting.
The Proposed Solution
It
is proposed that JSON names be defined as having two parts, a namespace
name and a local name. The two are combined as namespace name + "." +
local name to form a fully qualified JSON name. Namespace names MAY
contain the "." character. Local names MUST NOT contain the "."
character. Namespace names MUST consist of the reverse listing of
subdomains in a fully qualified DNS name. E.g. org.goland or
com.example.bigfatorg.definition.
To enable space savings and
to increase both the readability and write-ability of JSON a JSON name
MAY omit its namespace name along with the "." character that
concatenated it to its local name. In this case the namespace of the
name is logically set to the namespace of the name's parent object. E.g.
{ "org.goland.schemas.projectFoo.specProposal" :
"title": "JSON Extensions",
"author": { "firstName": "Yaron",
"com.example.schemas.middleName":"Y",
"org.goland.schemas.projectFoo.lastName": "Goland",
}
}
In
the previous example the name firstName, because it lacks a namespace
takes on its parent object's namespace. That parent is author which
also lacks a namespace so recursively author looks to its parent
specProposal which does have a namespace,
org.goland.schemas.projectFoo. middleName introduces a new namespace
"com.example.schemas", if the value was an object then the names in
that object would inherit the com.example.schemas namespace. Because
the use of the compression mechanism is optional the lastName value can
be fully qualified even though it shares the same namespace as its
parent. com.example.taxonomy
My main problem with the above approach is echoed by the first comment in response to Yaron's blog post; the above defined namespace scheme isn't completely compatible with XML namespaces. This means that if I have a Web service that emits both XML and JSON, I'll have to use different namespace names for the same elements even though all that differs is the serialization format. Besides the disagreement on the syntax of the namespace names, I think this would be a worthwhile addition to JSON.
In another blog post entitled Adding Extensibility to JSON Data Formats Yaron Goland writes
The Problem
How does one process JSON messages so that
they will support both backwards and forwards compatibility? That is,
how does one add new content into an existing JSON message format such
that those who do not understand the extended content will be able to
safely ignore it?
The Proposed Solution
In the
absence of additional information providing guidance on how to handle
unrecognized members a JSON processor compliant with this proposal MUST
ignore any members whose names are not recognized by the processor.
For
example, if a processor was expecting to receive an object that
contained a single member with the name "movieTitle" and instead it
receives an object with multiple members including "movieTitle",
"producer" and "director" then the JSON processor would, by default,
act as if the "producer" and "director" members were not present.
An
exception to this situation would be a member named "movie" whose value
is an object where the semantics of the members of that object is "the
local name of the members of this object are suitable for presenting as
titles and their values as text under those titles". In that case
regardless of the processor's direct knowledge of the semantics of the
members of the object (e.g. the processor may actually know about
movieTitle but not "producer" or "directory") the processor can still
process the unrecognized members because it has additional information
about how to process them.
This requirement does not apply to
incorrect usage of recognized names. For example, if the definition of
an object only allowed a single "movieTitle" member then having two
"movieTitle" members is simply an error and the ignore rule does not
apply.
This specification does not require that ignored members
be removed from the JSON structure. It is quite possible that other
processors who will deal with the message may recognized members the
current processor does not. Therefore it would make sense to let
unrecognized members remain in the JSON structure so that others who
process the structure may benefit from the extended information.
Definition: Simple value - A value of a type other than array or object.
If
a JSON processor encounters an array where it had expected to encounter
a simple value the processor MUST retrieve the first simple value in
the array and treat that as the value it was expecting and ignore the
other elements in the array.
Again, it looks like I'm going to go ahead and parrot the same feedback as a commenter to the original blog post. Defining an extensibility model where simple types can be converted to arrays in a future version seems like overkill and unnecessary complexity. It's not like it's that hard to another field to the type. The other thing I wondered about this blog post is that it seemed to define a problem set that doesn't really exist. It's not like there are specialized JSON parsers that barf if they see a field that they don't understand in widespread use. Requiring that the fields of various types be defined up front or else barfing when encountering undefined fields over the wire is primarily a limitation of statically typed languages and isn't really a problem for dynamic languages like JavaScript. Or am I missing something?