Last month Clemens Vasters wrote a blog post entitled Autonomy isn't Autonomy - and a few words about Caching where he talks about "autonomous" services and data caching. He wrote
A question that is raised quite
often in the context of "SOA" is that of how to deal with data.
Specifically, people are increasingly interested in (and concerned
about) appropriate caching strategies
...
By autonomous computing principles the left shape of
the service is "correct". The service is fully autonomous and protects
its state. That’s a model that’s strictly following the
Fiefdoms/Emissaries idea that Pat Helland formulated a few years back.
Very many applications look like the shape on the right. There are a
number of services sticking up that share a common backend store.
That’s not following autonomous computing principles. However, if you
look across the top, you'll see that the endpoints (different colors,
different contracts) look precisely alike from the outside for both
pillars. That’s the split: Autonomous computing talks very much about
how things are supposed to look behind your service boundary (which is
not and should not be anyone’s business but yours) and service
orientation really talks about you being able to hide any kind of such
architectural decision between a loosely coupled network edge. The two
ideas compose well, but they are not the same, at all.
..
However, I digress. Coming back
to the data management issue, it’s clear that a stringent autonomous
computing design introduces quite a few challenges in terms of data
management. Data consolidation across separate stores for the purposes
of reporting requires quite a bit of special consideration and so does
caching of data. When the data for a system is dispersed across a
variety of stores and comes together only through service channels
without the ability to freely query across the data stores and those
services are potentially “far” away in terms of bandwidth and latency,
data management becomes considerably more difficult than in a
monolithic app with a single store. However, this added complexity is a
function of choosing to make the service architecture follow autonomous
computing principles, not one of how to shape the service edge and
whether you use service orientation principles to implement it.
...
Generally, my advice with
respect to data management in distributed systems is to handle all data
explicitly as part of the application code and not hide data management
in some obscure interception layer. There are a lot of approaches that
attempt to hide complex caching scenarios away from application
programmers by introducing caching magic on the call/message path. That
is a reasonable thing to do, if the goal is to optimize message traffic
and the granularity that that gives you is acceptable. I had a scenario
where that was a just the right fit in one of my last newtelligence
projects. Be that as it may, proper data management, caching included,
is somewhat like the holy grail of distributed computing and unless
people know what they’re doing, it’s dangerous to try to hide it away.
That said, I believe that it is
worth a thought to make caching a first-class consideration in any
distributed system where data flows across boundaries. If it’s known at
the data source that a particular record or set of records won’t be
updated until 1200h tomorrow (many banks, for instance, still do
accounting batch runs just once or twice daily) then it is helpful to
flow that information alongside the data to allow any receiver
determine the caching strategy for the particular data item(s).
Service autonomy is one topic where I still have difficulty in striking the right balance. In an ideal SOA world, you have a mesh of interconnected services which depend on each other to perform their set tasks. The problem with this SOA ideal is that it introduces dependencies. If you are building an online service, dependencies mean that sometimes you'll be woken up by your pager at 3AM in the morning and it's somebody else's fault not yours. This may encourage people who build services to shun dependencies and build self-contained web applications which reinvent the wheel instead of utilizing external services. I'm still trying to decide if this is a bad thing or not.
As for Clemens' comments on caching and services, I find it interesting how even WS-* gurus inadvertently end up articulating the virtues of HTTP's design and the REST architectural style when talking about best practices for building services. I wonder if we will one day see WS-* equivalents of ETags and If-Modified-Since. WS-Caching anyone? :)