GData isn't a Best Practice Implementation of the Atom Publishing Protocol

June 11, 2007

@ 03:00 AM

I recently posted a blog post entitled Why GData/APP Fails as a General Purpose Editing Protocol for the Web which pointed out some limitations in the Atom Publishing Protocol (APP) and Google's implementation of it in GData with regards to being a general purpose protocol for updating data stores on the Web. There were a lot of good responses to my post from developers knowledgeable about APP including the authors of the specification, Bill de hÓra and Joe Gregorio. Below are links to some of these responses

Joe Gregorio: In which we narrowly save Dare from inventing his own publishing protocol
Bill de hÓra: APP on the Web has failed: miserably, utterly, and completely
David Megginson: REST, the Lost Update Problem, and the Sneakernet Test
Bill de hÓra: Social networks, web publishing and strategy tax
James Snell: Silly

There was also a post by Tim Bray entitled So Lame which questions my motives for writing the post and implies that it is some sinister plot by Microsoft to make sure that we use proprietary technologies to lock users in. I guess I should have given more background in my previous post. The fact is that lots of people have embraced building RESTful Web Services in a big way. My primary concern now is that we don't end up seeing umpteen different RESTful protocols from Microsoft [thus confusing our users and ourselves] and instead standardize on one or two. For example, right now we already have Atom+SSE, Web3S and Project Astoria as three completely different RESTful approaches for updating or retrieving data from a Microsoft data source on the Web. In my mind, that's two too many and that's just the stuff we've made public so there could be more. I'd personally like to see us reduce the set of RESTful protocols coming out of Microsoft to one and even better end up reusing existing Web standards, if possible. Of course, this is an aspiration and it is quite possible that all of these protocols are different for a reason (e.g. we have FTP, SMTP, and HTTP which all can be used to transfer files but have very different use cases) and there is no hope for unification let alone picking some existing standard. My previous post was intended to point out the limitations I and others had noticed with using the Atom Publishing Protocol (APP) as a general protocol for updating data stores that didn't primarily consist of authored content. The point of the post was to share these learnings with other developers working in this space and get feedback from the general developer community just in case there was something wrong with my conclusions.

Anyway, back to the title of this post. In my previous post I pointed out to the following limitations of APP as a general purpose protocol for editing Web content

Mismatch with data models that aren't microcontent
Lack of support for granular updates to fields of an item
Poor support for hierarchy

I have to admit that a lot of my analysis was done on GData because I assumed incorrectly that it is a superset of the Atom Publishing Protocol. After a closer reading of the ~~fifteenth~~ most recent draft APP specification spurred by the responses to my post by various members of the Atom community it seems clear that the approaches chosen by Google in GData run counter to the recommendations of Atom experts including both authors of the spec.

For problem #1, the consensus from Atom experts was that instead of trying to map a distinct concept such as a Facebook user to an Atom entry complete with a long list of proprietary extensions to the atom:entry element, one should instead create a specific data format for that type then treat it as a distinct media type that is linked from atom:entry. Thus in the Facebook example from my previous post, one would have a distinct user.xml file and a corresponding atom:entry which linked to it for each user of the system. Contrast this with the use of the gd:ContactSection in an atom:entry for representing a user. It also seems that the GData solution to the problem of what to put in the elements such as atom:author and atom:summary which are required by the specification but make no sense outside of content/microcontent editing scenarios is to omit them. It isn't spec compliant but I guess it is easier than putting in nonsensical values to satisfy some notion of a valid feed.

For problem #2, a number of folks pointed out that conditional PUT requests using ETags and the If-Match header are actually in the spec. This was my oversight since I skipped the section since the title "Caching and Entity Tags" didn't imply that it had anything to do with dealing with the lost update problem. I actually haven't found a production implementation of APP that supports conditional PUTs this shouldn't be hard to implement for services that require this functionality. This definitely makes the lost update problem more tractable. However a model where a client can just say "update the user's status message to X" still seems more straightforward than one where the client says "get the entire user element", "update the user's status message to X on the client", "replace the user on the server with my version of the user", and potentially "there is a version mismatch so merge my version of the user with the most recent version of the user from the server and try again". The mechanism GData uses for solving the lost update problem is available in the documentation topic on Optimistic concurrency (versioning). Instead of using ETags and If-Match, GData appends a version number to the URL to which the client publishes the updated atom:entry and then cries foul if the client publishes to a URL with an old version number. I guess you could consider this a different implementation of conditional PUTs from what is recommended in the most recent version of the APP draft spec.

For problem #3, the consensus seemed to be to use a atom:link elements to show hierarchy similar to what has been done in Atom threading extensions. I don't question the value of linking and think this is a fine approach for the most part. However, the fact is that in certain scenarios [especially high traffic ones] it is better for the client to be able to make requests like "give me the email with message ID 6789 and all the replies in that thread" than "give me all the emails and I'll figure out the hierarchy I'm interested in myself by piecing together link relationships". I notice that GData completely punts on representing hierarchy in the MessageKind construct which is intended for use in representing email messages.

Anyway I've learned my lesson and will treat the Atom Publishing Protocol (APP) and GData as separate protocols instead of using them interchangeably in the future.

Categories: XML Web Services

Tracked by:
"podcast directory" (podcast directory) [Trackback]

« Why GData/APP Fails as a General Purpose... | Home | Microsoft and the Atom Publishing Protoc... »

Monday, 11 June 2007 14:01:42 (GMT Daylight Time, UTC+01:00)

That's what makes you such a great blogger, Dare. You aren't afraid to postulate, take a stand, and ask questions. And it's the best thing as far as learning is concerned -- not just your learning. You made me study APP spec over this weekend, which ultimately is a good thing.

And just now, you made me look into Astoria.

Dimitri Glazkov

Monday, 11 June 2007 16:30:20 (GMT Daylight Time, UTC+01:00)

Actually, in one of the writeups to the recent Google Developer Day (can't find it anymore) I read that they considered their "versioned edit uri"-approach a mistake and were looking into ways to change it. But as I said I can't find it. Magically disappeared from the search index ... ?

Matthias Ernst

Monday, 11 June 2007 16:36:22 (GMT Daylight Time, UTC+01:00)

Dare, I'd like to echo Dimitri's sentiments above: I enjoyed these posts and give you a lot of credit for making them. Personally, I wouldn't want to deal with the immaturity and vitriol shown in the responses, particularly over such a technical issue that really does deserve some objective analysis and, yes, criticism. At this moment, I feel as though APP has a lot of momentum, and it would be one of the first places I looked for a standardized protocol for information sharing, so I appreciate that you bring up some of the issues that you've encountered. Even if APP backers can answer these questions, I am, frankly, much more leery of the APP community after this episode. To call someone an idiot, in various ways, for asking specific questions is not the hallmark of a productive community.

Thanks again for posting about these issues, rather than keeping them behind closed doors.

Anthony Cowley

Monday, 11 June 2007 17:28:17 (GMT Daylight Time, UTC+01:00)

Re #1, yuck. I want to get the whole document, not an index of the document with links to each paragraph, and then have to chase down all those links (which may themselves link more...). It's not just that it's a pain, it doesn't scale to GYM-sized site traffic. You need a way to inline the hiearchy (and perhaps control the depth/paging of results).

You already know this, but often it's not about being technologically right. Even if, when your analysis is done, you've found that APP is the worst of all these formats, it may still be the right choice. Personally I wouldn't have chosen HTML, and I might have gone with HyperG or some other bidirectional URL mechanism, and in both cases I would have been wrong. Simplicity wins.

And finally, questioning and exploring ideas will always make some people feel uncomfortable/threatened. The more emotional a response you get, the more on target you were with your questions. There are no stupid questions, only stupid answers.

Everyone has different context. Tim Bray seems to think you speak for all of Microsoft and sees sinister plots behind your words (and then he calls *you* clueless?). Those of us who know you know that you are not an evil plotter (although there was that time you played Nourish, which I think I had just mocked you for drafting, in response to my finishing attack and stayed alive for the winning counter-attack. That was kind of evil.)

So while I agree it's good to provide your readers, especially the ones who are a little clueless, with some additional context, ultimately you'll never please them all. You can lead people to knowledge, but not force understanding upon them.

Please keep asking uncomfortable questions of all of us. I've been on the receiving end of your challenging questions, and am the better for it.

Michael Brundage

Monday, 11 June 2007 18:21:47 (GMT Daylight Time, UTC+01:00)

problem #1 is the key point. If they are suggesting that an Atom collection is not a collection of items but a collection of item *references* then it's a non-starter as a general purpose mechanism.

Sure you could embed content using a different namespace, but then when good is the MIME type doing you?

Winter

Monday, 11 June 2007 18:22:55 (GMT Daylight Time, UTC+01:00)

"""However, the fact is that in certain scenarios [especially high traffic ones] it is better for the client to be able to make requests like "give me the email with message ID 6789 and all the replies in that thread" """

I won't argue with that scenario, there are definitely scenarios where pure linking will be a pain. There certainly wasn't enough consensus to put a hierarchical solution into the core protocol, but there are well defined extension points in the APP where such functionality could be added.

I would look at your specific example and lean towards looking at that as a search function and not a publishing function, and so would look at leveraging OpenSearch for that, which is one of the extensions to the APP that I would expect to see appear in short order.

Joe Gregorio

Monday, 11 June 2007 19:09:20 (GMT Daylight Time, UTC+01:00)

"I would look at your specific example and lean towards looking at that as a search function and not a publishing function, and so would look at leveraging OpenSearch for that, which is one of the extensions to the APP that I would expect to see appear in short order."

But even with this extension is sounds like the best APP can do is return a block of item *references*. How would the actual data be obtained?

Winter

Monday, 11 June 2007 20:59:18 (GMT Daylight Time, UTC+01:00)

> But even with this extension is sounds like the best
> APP can do is return a block of item *references*. How
> would the actual data be obtained?

The client would ask for the application/x-custom-mimetype representation of the resource, rather than the application/atom+xml one.

Brendan Taylor

Monday, 11 June 2007 23:48:18 (GMT Daylight Time, UTC+01:00)

Thanks Dare. Couldn't have expressed it better than Dimitri. Keep pushing and hope that we'll eventually end up with a general purpose publishing protocol.

Chui

Tuesday, 12 June 2007 07:26:16 (GMT Daylight Time, UTC+01:00)

Brendan, I believe Winter is really asking: How would APP return a container and all its content in a single request?

I suppose you could invent application/x-my-full-hierarchy-type but then you're not really using APP, you're just issuing a standard GET for your own XML microformat.

APP needs to provide clients ways to control inlining hierarchies and paging result sets (things the database community has done for years -- we're talking selection, projection, top, ...). Without that, APP isn't going to go very far, or worse everyone's going to use it as a starting point and then embrace and extend. Yuck.

This was one of Dare's points, but it seems some people didn't get it, and others disagree with the premise (that such features are required).

Michael Brundage

Tuesday, 12 June 2007 11:07:45 (GMT Daylight Time, UTC+01:00)

@Michael:

> (...) everyone's going to use it as a starting point and then embrace and extend.

And that's a bad thing?

It seems Dare's has has shown how there is a great divide in the developers community. This is usually a good thing that people questions things. However why not having done that within the WG? APP is now at its 15 draft and is pretty much done but it's been more than 4 years the WG has been around, why waiting now to make those comments?

When I read comments going in the direction of Dare it seems that they come from people who have not yet reviewed or implemented the protocol. I mean when you write this:

> APP needs to provide clients ways to control inlining hierarchies and paging result sets (things the database community has done for years -- we're talking selection, projection, top, ...).

1. APP does have paging results already.
2. Why on Earth do you want to stuff a protocol with so much layers which have nothing to do with it (selections, projections, top, why?).

This is ultimately a problem per domain that must not be dealt by the protocol. I just don't understand how you could want this.

I would instead understand a specification atop APP that deals with those topics.

Dare was right to asks the questions he asked. The problem is coming when you say "APP has failed" and admit you've only reviewed the protocol over a weekend. Where is the credibility?

Dare has missed the philosophy behind APP and complains it doesn't fit its model. What did you expect from people who've been working on it for so long? Had Dare actively halped the WG and concluded there was a failure the response level would have been much different.

Sylvain Hellegouarch

Tuesday, 12 June 2007 11:13:25 (GMT Daylight Time, UTC+01:00)

Forgive me for the spelling/grammar mistakes. :/

Sylvain Hellegouarch

Tuesday, 12 June 2007 17:14:29 (GMT Daylight Time, UTC+01:00)

Sylvain,
So it seems you agree that APP isn't a general purpose editing protocol for all Web data stores.

I didn't realize I had to join a working group before evaluating a technology. Did you join the HTML working group before choosing to use HTML? :)

Dare Obasanjo

Tuesday, 12 June 2007 21:56:16 (GMT Daylight Time, UTC+01:00)

Sylvain,

I recommend you abandon the fallacious arguments (such as ad hominem and "it's too late to argue now") and nitpicking, and instead focus on the weightier technical points being debated.

You seem to think that Dare and I are unfamiliar with partial GET and the next link in APP collections. Perhaps instead you should give us some credit for knowing just a little bit about what we're talking about and listen to the message we're trying to convey.

When I tell you that APP doesn't have the pagination and inlining control that vendors need, I'm not speaking in abstract terms or from a position of ignorance. I'm looking at specific real-world and large-scale scenarios that APP does not enable.

Of all APP's faults, I think its notion of a collection of references instead of a full collection with complete (or partial, client-selectable) hierarchy in a single GET is one of its greatest failings. Astoria has some neat ideas here, and so did XQuery before it (not the final language, but some of the interim drafts).

The major vendors -- not just Microsoft -- will never settle for a protocol that requires issuing N+1 GETs to fully retrieve a collection of N complex items. Maybe if they don't do due diligence, they'll temporarily adopt it until they realize the performance implications. This is the kind of seemingly trivial but actually important issue that will prevent APP widespread adoption.

Understand that APP, Astoria, XQuery, etc. are not destinations. They're just steps along the software development journey. We're all kind of wondering whether APP is an interesting place to stop for awhile, or merely a novelty roadside attraction. Stand by the side of the road and yell at the passing travelers if you want, but then you're just reinforcing the latter impression.

Michael Brundage

Wednesday, 13 June 2007 04:02:14 (GMT Daylight Time, UTC+01:00)

As I already asked over on Joe Gregorio's blog, any chance you guys could calling it "AtomPP" instead of "APP?" That's the term some (enlightened?) people are using on [rest-discuss]. If the problem with the term "APP" isn't obvious it's because it's too easily confused with the short form on the word "APPlication."

Mike Schinkel

Wednesday, 13 June 2007 06:00:19 (GMT Daylight Time, UTC+01:00)

Michael:

“Embracing and extending” AtomPP is what you are *supposed* to do. It took almost 5 years to get to Last Call, and that is *without* the working group descending into the hierarchy rat hole. Or any number of other rat holes. We could have wasted a few more years doing design by committee; that would have served precisely no one. At this point that no one has a good grasp on how to do hierarchies generically. Various niches have their particular implementations, but none of them is proven to work at web scale; so it’s better if AtomPP get out of the way and let the niches do their thing for now.

The goal was to avoid design by committee and just codify experience, but that does not exist yet for all the problems in the domains where AtomPP is applicable. In a few years, or maybe half a decade, or maybe a full one, whenever we know more about hierarchies or partial updates or batch transfer or whatever other thorny issue is open today, others can come back and write another spec on top of AtomPP to deal with them. It will happen in due time, after the market has shaken out.

Dare:

AtomPP hasn’t *failed* for the absence of some more complex features – it’s why it *succeeds*!

Aristotle Pagaltzis

Wednesday, 13 June 2007 08:14:43 (GMT Daylight Time, UTC+01:00)

@Dare and @Michael,

I apologize for sounding a bit childish and of course I do accept the fact you don't need to participate to a WG to have the right to criticize it.

I guess as usual it's merely a question of miscommunication rather than anything else ;)

However Aristotle precisely expresses my belief that it's thanks to that lack of complex features that AtomPP may succeed on the long run.

Some protocols were specified in such a great detail of features that they've become almost impossible to be used out of the context they had been brewed in.

The problem is that we cannot foresee all the potential contexts in which a protocol could fit in and that means that making it simple assures the protocol could actually be flexible enough throughout the years.

Otherwise, why didn't we use WebDAV more broadly over the years?

> So it seems you agree that APP isn't a general purpose editing protocol for all Web data stores.

As a matter of fact I don't think indeed that AtomPP is the response to all use cases but I do believe it's flexible enough to be considered anyway and that's why I trust it to succeed.

Anyway, I guess some of us (me for instance ;)) have been a little emotional about it and it's never a good attitude.

Sylvain Hellegouarch

Wednesday, 13 June 2007 09:07:09 (GMT Daylight Time, UTC+01:00)

Oh and btw, Dare:

Thanks for your thoughtful recantation and restated position. I sided with Tim in his entry’s comments because incompetence did not seem like a reasonable assumption, and I had to wonder whether malice was more so; this post completely clears things up. Sorry for the flak you caught over what turned out to have been a simple misunderstanding.

Aristotle Pagaltzis

Wednesday, 13 June 2007 16:59:05 (GMT Daylight Time, UTC+01:00)

[I thought I posted this comment before, but I don't see it in the list of comments, so I am posting it again. Sorry if this is a duplicate.]

"[C]ertain limitations in the Atom Publishing Protocol become quite obvious when you get outside of blog editing scenarios for which the protocol was originally designed. For this reason, we will likely standardize on a different RESTful protocol which I'll discuss in a later post."

Given your deeper understanding of AtomPP, is it still as likely that your team will standardize on a different RESTful protocol? Inquiring minds want to know!

Nick Gall

Tuesday, 19 June 2007 17:00:23 (GMT Daylight Time, UTC+01:00)

Nice site and good post

linki sponsorowane

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for GData isn't a Best Practice Implementation of the Atom Publishing Protocol - Dare Obasanjo's weblog