The products that the database vendors were building had less and less to do
with what the customers wanted...Google itself (and I'd bet a lot Yahoo too)
have similar needs to the ones Federal Express or Morgan Stanley or Ford or
others described, quite eloquently to me. So, what is this growing disconnect?
It is this. Users of databases tend to ask for three very simple things:
1) Dynamic schema so that as the business model/description of goods or
services changes and evolves, this evolution can be handled seamlessly in a
system running 24 by 7, 365 days a year. This means that Amazon can track new
things about new goods without changing the running system. It means that
Federal Express can add Federal Express Ground seamlessly to their running
tracking system and so on. In short, the database should handle unlimited
change.
2) Dynamic partitioning of data across large dynamic numbers of machines. A
lot people people track a lot of data these days. It is common to talk to
customers tracking 100,000,000 items a day and having to maintain the
information online for at least 180 days with 4K or more a pop and that adds (or
multiplies) up to a 100 TB or so. Customers tell me that this is best served up
to the 1MM users who may want it at any time by partioning the data because, in
general, most of this data is highly partionable by customer or product or
something. The only issue is that it needs to be dynamic so that as items are
added or get "busy" the system dynamically load balances their data across the
machines. In short, the database should handle unlimited scale with very low
latency. It can do this because the vast majority of queries will be local to a
product or a customer or something over which you can partion...
3) Modern indexing. Google has spoiled the world. Everyone has learned that
just typing in a few words should show the relevant results in a couple of
hundred milliseconds. Everyone (whether an Amazon user or a customer looking up
a check they wrote a month ago or a customer service rep looking up the history
for someone calling in to complain) expects this. This indexing, of course,
often has to include indexing through the "blobs" stored in the items such as
PDF's and Spreadsheets and Powerpoints. This is actually hard to do across all
data, but much of the need is within a partioned data set (e.g. I want to and
should only see my checks, not yours or my airbill status not yours) and then it
should be trivial.
...
Users of databases don't believe that they are getting any of these three.
Salesforce, for example, has a lot of clever technology just to hack around the
dynamic schema problem so that 13,000 customers can have 13,000 different views
of what a prospect is.
If the database vendors ARE solving these problems, then they aren't doing a
good job of telling the rest of us.