Thoughts on Google's Proposal for Granular Updates in AtomPub

February 16, 2008

@ 07:29 PM

Via Sam Ruby's post Embrace, Extend then Innovate I found a link to Joe Gregorio's post entitled How to do RESTful Partial Updates. Joe's post is a recommendation of how to extend the Atom Publishing Protocol (RFC 5023) to support updating the properties of an entry without having to replace the entire entry. Given that Joe works for Google on GData, I have assumed that Joe's post is Google's attempt to float a trial balloon before extending AtomPub in this way. This is a more community centric approach than the company has previously taken with GData, OpenSocial, etc where these protocols simply appeared out of nowhere with proprietary extensions to AtomPub with an FYI to the community after the fact.

The Problem Statement

In the Atom Publishing Protocol, an atom:entry represents an editable resource. When editing that resource, it is intended that an AtomPub client should download the entire entry, edit the fields it needs to change and then use a conditional PUT request to upload the changed entry.

So what's the problem? Below is an example of the results one could get from invoking the users.getInfo method in the Facebook REST API.

<user> <uid>8055</uid> <about_me>This field perpetuates the glorification of the ego. Also, it has a character limit.</about_me> <activities>Here: facebook, etc. There: Glee Club, a capella, teaching.</activities> <birthday>November 3</birthday> <books>The Brothers K, GEB, Ken Wilber, Zen and the Art, Fitzgerald, The Emporer's New Mind, The Wonderful Story of Henry Sugar</books> <current_location> <city>Palo Alto</city> <state>CA</state> <country>United States</country> <zip>94303</zip> </current_location> <first_name>Dave</first_name> <interests>coffee, computers, the funny, architecture, code breaking,snowboarding, philosophy, soccer, talking to strangers</interests> <last_name>Fetterman</last_name> <movies>Tommy Boy, Billy Madison, Fight Club, Dirty Work, Meet the Parents, My Blue Heaven, Office Space </movies> <music>New Found Glory, Daft Punk, Weezer, The Crystal Method, Rage, the KLF, Green Day, Live, Coldplay, Panic at the Disco, Family Force 5</music> <name>Dave Fetterman</name> <profile_update_time>1170414620</profile_update_time> <relationship_status>In a Relationship</relationship_status> <religion/> <sex>male</sex> <significant_other_id xsi:nil="true"/> <status> <message>Pirates of the Carribean was an awful movie!!!</message> </status> </user>

If this user was represented as an atom:entry then each time an application wants to edit the user's status message it needs to download the entire data for the user with its over two dozen fields, change the status message in an in-memory representation of the XML document and then upload the entire user atom:entry back to the server. This is a fairly expensive way to change a status message compared to how this is approached in other RESTful protocols (e.g. PROPPATCH in WebDAV).

Previous Discussions on this Topic: When the Shoe is on the Other Foot

A few months ago I brought up this issue as one of the problems encountered when using the Atom Publishing Protocol outside of blog editing contexts in my post Why GData/APP Fails as a General Purpose Editing Protocol for the Web. In that post I wrote

Lack of support for granular updates to fields of an item: As mentioned in the previous section editing an entry requires replacing the old entry with a new one. The expected client interaction with the server is described in section 5.4 of the current APP draft and is excerpted below.
Retrieving a Resource
Client                                     Server
  |                                           |
  |  1.) GET to Member URI                    |
  |------------------------------------------>|
  |                                           |
  |  2.) 200 Ok                               |
  |      Member Representation                |
  |<------------------------------------------|
  |                                           |
The client sends a GET request to the URI of a Member Resource to retrieve its representation.

The server responds with the representation of the Member Resource.

Editing a Resource
Client                                     Server
  |                                           |
  |  1.) PUT to Member URI                    |
  |      Member Representation                |
  |------------------------------------------>|
  |                                           |
  |  2.) 200 OK                               |
  |<------------------------------------------|
The client sends a PUT request to store a representation of a Member Resource.

If the request is successful, the server responds with a status code of 200.
Can anyone spot what's wrong with this interaction? The first problem is a minor one that may prove problematic in certain cases. The problem is pointed out in the note in the documentation on Updating posts on Google Blogger via GData which states

IMPORTANT! To ensure forward compatibility, be sure that when you POST an updated entry you preserve all the XML that was present when you retrieved the entry from Blogger. Otherwise, when we implement new stuff and include <new-awesome-feature> elements in the feed, your client won't return them and your users will miss out! The Google data API client libraries all handle this correctly, so if you're using one of the libraries you're all set.

Thus each client is responsible for ensuring that it doesn't lose any XML that was in the original atom:entry element it downloaded. The second problem is more serious and should be of concern to anyone who's read Editing the Web: Detecting the Lost Update Problem Using Unreserved Checkout. The problem is that there is data loss if the entry has changed between the time the client downloaded it and when it tries to PUT its changes.

That post was negatively received by many members of the AtomPub community including Joe Gregorio. Joe wrote a scathing response to my post entitled In which we narrowly save Dare from inventing his own publishing protocol where he addressed that particular issue as follows

The second complaint is one of data loss:

The problem is that there is data loss if the entry has changed between the time the client downloaded it and when it tries to PUT its changes.

Fortunately, the only real problem is that Dare seems to have only skimmed the specification. From Section 9.3:

To avoid unintentional loss of data when editing Member Entries or Media Link Entries, Atom Protocol clients SHOULD preserve all metadata that has not been intentionally modified, including unknown foreign markup as defined in Section 6 of [RFC4287].

And further, from Section 9.5:

Implementers are advised to pay attention to cache controls, and to make use of the mechanisms available in HTTP when editing Resources, in particular entity-tags as outlined in [NOTE-detect-lost-update]. Clients are not assured to receive the most recent representations of Collection Members using GET if the server is authorizing intermediaries to cache them.

Hey look, we actually reference the lost update paper that specifies how to solve this problem, right there in the spec! And Section 9.5.1 even shows an example of just such a conditional PUT failing. Who knew? And just to make this crystal clear, you can build a server that is compliant to the APP that accepts only conditional PUTs. I did, and it performed quite well at the last APP Interop.

The bottom line of Joe's response is that he didn't think it was a real problem. My assumption is that his perspective on the problem has broadened now that he has a responsibility to the wide breadth of AtomPub implementations at Google as opposed to when his design decisions were being influenced by a home grown blogging server he wrote in his free time.

The Google Solution: Embrace, Extend then Innovate

Now that Joe thinks supporting granular updates of a resource is a valid scenario, he and the folks at Google have proposed the following solution to the problem. Joe writes

Now if I wanted to update part of this entry, say the title, using the mechanisms in RFC 5023 then I would change the value of the title element and PUT the whole modified entry back to the the URI http://example.org/edit/first-post.atom. Now this document isn't large, but we'll use it to demonstrate the concepts. The first thing we want to do is add a URI Template that allows us to construct a URI to PUT changes back to:
<?xml version="1.0"?>
<entry         
        xmlns="http://www.w3.org/2005/Atom"
        xmlns:t="http://blah...">
<t:link_template ref="sub" 
        href="http://example.org/edit/first-post/{-listjoin|;|id}"/>
    <title>Atom-Powered Robots Run Amok</title>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <author><name>John Doe</name></author>
    <content>Some text.</content>
    <link rel="edit"
        href="http://example.org/edit/first-post.atom"/>
</entry>
Then we need to add id's to each of the pieces of the document we wish to be able to individually update. For this we'll use the W3C xml:id specification:
<?xml version="1.0"?>
<entry         
        xmlns="http://www.w3.org/2005/Atom"
        xmlns:t="http://blah...">   
    <t:link_template ref="sub" href="http://example.org/edit/first-post/{-listjoin|;|id}"/>
    <title xml:id="X1">Atom-Powered Robots Run Amok</title>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <author xml:id="X2"><name>John Doe</name></author>
    <content xml:id="X3">Some text.</content>
    <link rel="edit"
        href="http://example.org/edit/first-post.atom"/>
</entry>
So if I wanted to update both the content and the title I would construct the partial update URI using the id's of the elements I want to update:

http://example.org/edit/first-post/X1;X3

And then I would PUT an entry to the URI with only those child elements:
PUT /edit/first-post/X1;X3
Host: example.org

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom">
   <title xml:id="X1">False alarm on the Atom-Powered Robots things</title>
   <content xml:id="X3">Sorry about that.</content>
</entry>

The Problems with the Google Solution: Your Shipment of FAIL has Arrived

Ignoring the fact that this spec depends on specifications that are either experimental (URI Templates) or not widely supported (xml:id), there are still significant problems with how this approach (mis)uses the Atom Publishing Protocol. Sam Ruby eloquently points out the problems in his post Embrace, Extend then Innovate where he wrote

With HTTP PUT, the the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server. Having some servers interpret the removal of elements (such as content) as a modification, and others interpret the requests in such a way that elided elements are to be left alone is hardly uniform or self-descriptive. In fact, depending on usage, it is positively stateful.

I’m fine with a server choosing to interpret the request anyway it sees fit. As a black box, it could behave as if it updated the resource as requested and then one nanosecond later — and before it processes any other requests — fill in missing data with defaults, historical data, whatever. My concern is with clients coding with to the assumption as to how the server works. That’s called coupling.

The main problem is that it changes the expected semantics of HTTP PUT in a way that not only conflicts with how PUT is typically used in other HTTP-based protocols but also how it is used in AtomPub. It's also weird that the existence of xml:id in an Atom document is now used to imply special semantics (i.e. this field supports direct editing). I especially don't like that after all is said and done, the server controls which fields can be partially updated or not which seems to imply a tight coupling between clients and servers (e.g. some servers will support partial updates on all fields, some may only support partial updates on atom:title + atom:category while others will support partial updates on a different set of fields). So the code for editing a title or category changes depending on which AtomPub service you are talking to.

From where I stand Joe has pretty much invented yet another diff + patch protocol for XML documents. When I worked on the XML team at Microsoft, there were quite a few floating around the company including Diffgram, UpdateGram, and Patchgrams to name three. So I've been around the block when it comes to diff + patch formats for XML and this one has its share of issues. The most eye brow raising issue with the diff + patch protocol is that half the semantics of the update are in the XML document (which elements to add/edit) while the other half are in the URL (if an ID exists in the URL but is not in the document then it is a delete). This means the XML isn't very self describing nor can it really be said that the URL is identifying a resource [more like it identifies an operation].

Actual Solution: Read the Spec

In Joe's original response to my post his suggestion was that the solution to the "problem" of lack of support for granular updates of entries in AtomPUb is to read the spec. In retrospect, I agree. If a field is important enough that it needs to be identifiable and editable then it should be its own resource. If you want to make it part of another resource then use atom:link to link both resources.

Case closed. Problem solved.

Now Playing: Too Short - Couldn't Be a Better Player Than Me (feat. Lil Jon & The Eastside Boyz)

Categories: Syndication Technology | XML Web Services

Tracked by:
"child custody" (child custody) [Trackback]
"home sell an item register now login help about us" (home sell an item register... [Trackback]

« The Windows Live Spaces Photo API (alpha... | Home | Facebook Moves to Curtail Application Sp... »

Saturday, 16 February 2008 20:57:02 (GMT Standard Time, UTC+00:00)

You should really look into RDF, Dare. A lot of these problems go away operating with the constructs of the semantic web.

Industry is spending a lot of brain cycles on adapting technologies to do things they were never designed to do (AtomPub in this case).

-jf

Johnny Fry

Saturday, 16 February 2008 21:19:28 (GMT Standard Time, UTC+00:00)

"If a field is important enough that it needs to be identifiable and editable then it should be its own resource."

I completely agree with your conclusion, Dare, but I reached it from a different path: If something needs to be updateable, then it must be addressable. Everything that's addressable is a resource in the model.

Thinking of web service resources like bytes and URIs like virtual addresses solves a lot of "problems" in the web service space. You can't update a bit in RAM without updating the entire byte (sometimes a whole machine word). You can't update the middle of a value in a relational database or part of a spreadsheet cell without updating the entire value. Etc.

Michael Brundage

Saturday, 16 February 2008 21:49:48 (GMT Standard Time, UTC+00:00)

"Given that Joe works for Google on GData, I have assumed that Joe's post is Google's attempt to float a trial balloon before extending AtomPub in this way."

Sorry, but no, just an idea I had and wrote up on a long flight.

"This is a more community centric approach than the company has previously taken with GData, OpenSocial, etc where these protocols simply appeared out of nowhere with proprietary extensions to AtomPub with an FYI to the community after the fact."

I will keep that in mind for the future.

Joe Gregorio

Sunday, 17 February 2008 00:25:05 (GMT Standard Time, UTC+00:00)

You seem really smart Dare! Too bad you make absolutely zero fucking sense.

simone

Sunday, 17 February 2008 00:32:48 (GMT Standard Time, UTC+00:00)

Also, are you aware that you're doing Google's homework for them? Are you a total slack jawed reject? Stop being such a douche.

simone

Sunday, 17 February 2008 15:00:46 (GMT Standard Time, UTC+00:00)

Dare:

Don't you think that there is a fundamental difference between expressing the changes you made to a resource representation as part of an inter-action and the expression of what you want the resource to look like once it is updated?

Don't you think that CRUDing from the representation consumer side is hopeless?

Jean-Jacques Dubray

Sunday, 17 February 2008 15:34:01 (GMT Standard Time, UTC+00:00)

Jean-Jacques,
"Don't you think that there is a fundamental difference between expressing the changes you made to a resource representation as part of an inter-action and the expression of what you want the resource to look like once it is updated? "

Agreed. This is why trying to treat both as HTTP PUT requests rubs me the wrong way. A lot of responses to this subject [including Roy Fielding] have suggested using a PATCH method and sending around updategrams/diffgrams/patchgrams. This is pretty much what Joe's solution would be once you fixed all the suspect design decisions. See http://plasmasturm.org/log/493/ more thinking along these lines.

However this would be a significant extension to the AtomPub protocol and it would bifurcate the world of AtomPub clients and servers. Again, this was one of the reasons I favored trying such ideas in a new protocol like Web3S versus jamming them into AtomPub. However the AtomPub community definitely made that seem like an even worse idea given the flames my posts about Web3S received. So we'll just have to muddle along with the cards we've been dealt.

Dare Obasanjo

Monday, 18 February 2008 22:11:50 (GMT Standard Time, UTC+00:00)

Dare,

You raise a very good point but I wonder if it doesn't come down simply to a matter of setting the expectation. In your original Web3S article I, among others, pointed out that AtomPub was aiming at being generic enough so that people were free to use its core and be quickly efficient for the most common operations. If I look at XMPP, they broke down the protocol into pieces that made it usable without having to implement all its extension. This is powerful for a protocol since it allows developers to produce something usable quickly . It's also frustrating as it means you can never ensure that the other end will have implemented the part you're interested in.

Web3S has been designed with the fact that edit operations were part of the core of operations to make the protocol useful. That's fine. the AtomPub WG didn't make that decision. Of course there is a big risk we see appearing proprietary or not so well thought extensions. I personally believe that's a risk worth taking in that case.

So why is it a matter of expectation? Because at the end of the day edit operations are so tied to the actual semantic of the application behind that they cannot safely be made generic, well I believe.

For simple AtomPub client this won't matter because they will surely care only about RFC 5023. Clients of richer AtomPub server will more likely implement those extensions as long as it's well documented.

The AtomPub WG offered the community with a low entry barrier protocol. The expectations are reachable by the many while providing with a powerful and well defined resource oriented protocol.

Sylvain Hellegouarch

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Thoughts on Google's Proposal for Granular Updates in AtomPub - Dare Obasanjo's weblog

The Problem Statement

Previous Discussions on this Topic: When the Shoe is on the Other Foot

Retrieving a Resource

Editing a Resource

The Google Solution: Embrace, Extend then Innovate

The Problems with the Google Solution: Your Shipment of FAIL has Arrived

Actual Solution: Read the Spec