James Snell has a blog post entitled Batch! which talks about the Batch processing model in GData APIs. He provides a sample of a GData batch request and points out the following
If the mere sight of this doesn’t give you shivers and shakes, let me give you a few reasons why it should:
- It’s not valid Atom. Note the first entry in the feed for instance. An Atom entry has an id, a title, some content, an author, some links, maybe some categories, etc. If the type of objects you want to represent does not also have those things, Atom is not the right format to use.
- It only works with Atom. What about binary resources like Jpeg’s? I guess we could base64 encode the binary data and stuff that into our invalid Atom entries but doing so would suck.
- We can’t use Etag’s and conditional requests
- I’m sure there are more reasons but these should be enough to convince you that a better approach is needed.
In a previous post I entitled One Protocol to Rule Them All and in the Darkness Bind Them I pointed out that since the Atom Publishing Protocol is not a good fit for interacting with data types that aren’t microcontent, it would need to be embraced and extended to satisfy those needs. In addition, this leads to problems because different vendors will embrace and extend it in different ways which fragments interoperability.
An alternative approach would be for vendors to utilize protocols that are better suited for the job instead of creating incompatible versions of a standard protocol. However the response I’ve seen from various people is that it is better if we have multiple slightly incompatible implementations of a single standard than multiple completely incompatible proprietary technologies. I’ve taken to calling this “the ODF vs. OOXML lesson”. This also explains why there was so much heat in the RSS vs. Atom debates but not so much when it came to debates over Yahoo’s podcasting extensions vs. Apple’s podcasting extensions to RSS.
Let’s say we now take it as a given that there will be multiple proprietary extensions to a standard protocol and this is preferable to the alternative, what should we have as the ground rules to ensure interoperability isn’t completely thrown out the windows? A fundamental ground rule should be that vendors should actually provide standards compliant implementations of the protocol before deciding to embrace and extend it. That way clients and services that conform to the standard can interoperate with them. In this regard, GData falls down horribly as James Snell points out.
Given that Joe Gregorio now works at Google and is a co-author of RFC 5023, I assume it is just a matter of time before Google fixes this brokenness. The only question is when?
PS: Defining support for batch operations in a standard way is going to be rather difficult primarily because of how to deal with failure modes. The fact that there is always the struggle between consistency and availability in distributed systems means that some folks will want a failure in any of the batched operations to result in the equivalent of a rollback while there are others that don’t care if one or two out of a batch of fifty operations fails. Then there are some folks in the middle for whom “it depends on the context” for which kind of failure mode they want.
Now playing: Method Man - Say (feat. Lauryn Hill)