Some Thoughts on Web Services, Caching and Autonomy

July 12, 2006

@ 04:49 PM

Last month Clemens Vasters wrote a blog post entitled Autonomy isn't Autonomy - and a few words about Caching where he talks about "autonomous" services and data caching. He wrote

A question that is raised quite often in the context of "SOA" is that of how to deal with data. Specifically, people are increasingly interested in (and concerned about) appropriate caching strategies
...
By autonomous computing principles the left shape of the service is "correct". The service is fully autonomous and protects its state. That’s a model that’s strictly following the Fiefdoms/Emissaries idea that Pat Helland formulated a few years back. Very many applications look like the shape on the right. There are a number of services sticking up that share a common backend store. That’s not following autonomous computing principles. However, if you look across the top, you'll see that the endpoints (different colors, different contracts) look precisely alike from the outside for both pillars. That’s the split: Autonomous computing talks very much about how things are supposed to look behind your service boundary (which is not and should not be anyone’s business but yours) and service orientation really talks about you being able to hide any kind of such architectural decision between a loosely coupled network edge. The two ideas compose well, but they are not the same, at all.
..
However, I digress. Coming back to the data management issue, it’s clear that a stringent autonomous computing design introduces quite a few challenges in terms of data management. Data consolidation across separate stores for the purposes of reporting requires quite a bit of special consideration and so does caching of data. When the data for a system is dispersed across a variety of stores and comes together only through service channels without the ability to freely query across the data stores and those services are potentially “far” away in terms of bandwidth and latency, data management becomes considerably more difficult than in a monolithic app with a single store. However, this added complexity is a function of choosing to make the service architecture follow autonomous computing principles, not one of how to shape the service edge and whether you use service orientation principles to implement it.
...
Generally, my advice with respect to data management in distributed systems is to handle all data explicitly as part of the application code and not hide data management in some obscure interception layer. There are a lot of approaches that attempt to hide complex caching scenarios away from application programmers by introducing caching magic on the call/message path. That is a reasonable thing to do, if the goal is to optimize message traffic and the granularity that that gives you is acceptable. I had a scenario where that was a just the right fit in one of my last newtelligence projects. Be that as it may, proper data management, caching included, is somewhat like the holy grail of distributed computing and unless people know what they’re doing, it’s dangerous to try to hide it away.
That said, I believe that it is worth a thought to make caching a first-class consideration in any distributed system where data flows across boundaries. If it’s known at the data source that a particular record or set of records won’t be updated until 1200h tomorrow (many banks, for instance, still do accounting batch runs just once or twice daily) then it is helpful to flow that information alongside the data to allow any receiver determine the caching strategy for the particular data item(s).

Service autonomy is one topic where I still have difficulty in striking the right balance. In an ideal SOA world, you have a mesh of interconnected services which depend on each other to perform their set tasks. The problem with this SOA ideal is that it introduces dependencies. If you are building an online service, dependencies mean that sometimes you'll be woken up by your pager at 3AM in the morning and it's somebody else's fault not yours. This may encourage people who build services to shun dependencies and build self-contained web applications which reinvent the wheel instead of utilizing external services. I'm still trying to decide if this is a bad thing or not.

As for Clemens' comments on caching and services, I find it interesting how even WS-* gurus inadvertently end up articulating the virtues of HTTP's design and the REST architectural style when talking about best practices for building services. I wonder if we will one day see WS-* equivalents of ETags and If-Modified-Since. WS-Caching anyone? :)