The debate on the pros and cons of non-relational databases which are typically described as “NoSQL databases” has recently been heating up. The anti-NoSQL backlash is in full swing from the rebuttal to one of my recent posts of mine I saw mentioned in Dennis Forbes’s write-up The Impact of SSDs on Database Performance and the Performance Paradox of Data Explodification (aka Fighting the NoSQL mindset) and similar thoughts expressed in typical rant-y style by Ted Dziuba in his post I Can't Wait for NoSQL to Die.

This will probably be my last post on the topic for a while given that the discussion has now veered into religious debate territory similar to vi vs. emacs OR functional vs. object oriented programming. With that said…

It would be easy to write rebuttals of what Dziuba and Forbes have written but from what I can tell people are now talking past each other and are now defending entrenched positions. So instead I’ll leave this topic with an analogy. SQL databases are like automatic transmission and NoSQL databases are like manual transmission. Once you switch to NoSQL, you become responsible for a lot of work that the system takes care of automatically in a relational database system. Similar to what happens when you pick manual over automatic transmission. Secondly, NoSQL allows you to eke more performance out of the system by eliminating a lot of integrity checks done by relational databases from the database tier. Again, this is similar to how you can get more performance out of your car by driving a manual transmission versus an automatic transmission vehicle.

However the most notable similarity is that just like most of us can’t really take advantage of the benefits of a manual transmission vehicle because the majority of our driving is sitting in traffic on the way to and from work, there is a similar harsh reality in that most sites aren’t at Google or Facebook’s scale and thus have no need for a Bigtable or Cassandra. As I mentioned in my previous post, I believe a lot of problems people have with relational databases at web scale can be addressed by taking a hard look at adding in-memory caching solutions like memcached to their infrastructure before deciding the throw out their relational database systems.

Note Now Playing: Lady Gaga - Bad Romance Note


 

Monday, 29 March 2010 17:09:09 (GMT Daylight Time, UTC+01:00)
Hi there Dare.

While I think Ted went to extremes quite intentionally, perhaps to elicit debate, if you have a rebuttal to what I have written then please post it (I'm done with the blog-to-blog debate on this topic, so I won't reply to it). Remarkably I am way more centrist that many have held me as being.

However to your specific analogy, I respectfully disagree. One of the biggest benefits of NoSQL is that you eliminate all of the nuanced manual activities that are involved with a RDBMS. Instead you just toss sets of data into buckets and you're done, and everything else is automagical. If you want to append a new "Up arrow" to a bucket, you read the contents, concatenate the action, and write it back.

On the surface it sounds incredibly simple.
Monday, 29 March 2010 17:09:50 (GMT Daylight Time, UTC+01:00)
I just like the simplicity of document oriented databases and the way they interact with code.
Ryan
Monday, 29 March 2010 18:07:38 (GMT Daylight Time, UTC+01:00)
Dennis,
 If the data is highly interrelated like most social networking data is, then the developer is responsible for managing data integrity. This is the additional complexity I was referring to in my example.

 My main issue with your post is that it isn't a fair comparison  since all the data fit on a single machine. The story is a lot  different when your data is sharded across multiple DBs either due to Storage or I/O constraints.
Monday, 29 March 2010 18:23:03 (GMT Daylight Time, UTC+01:00)
It wasn't intended to be a fair comparison. On the flip side I could have easily run my tests on a server that would have yielded an easy 10x improvement in results per second on a single machine.

A fair comparison wasn't the goal.

The problem that sites like Digg, Reddit, and so on face is overwhelmingly an I/O problem -- this is by many orders of magnitude the weakest link in the traditional computing platform. It is a very real problem that anyone with a large database has dealt with (once you're beyond the scale of memory caching performance often goes to hell).

The most important point I try to make in my NoSQL entries is that the micro-optimization of NoSQL (Digg's solution was 99% mass denormalization) is absolutely *dwarfed* by the gain they would yield simply subbing in SSDs, at least presuming a decent RDBMS that can adapt accordingly. Most of the NoSQL advocacy is based upon an obsolete understanding of the runtime platform.

Anyways, cheers!
Monday, 29 March 2010 18:35:55 (GMT Daylight Time, UTC+01:00)
Dennis,
Agreed on I/O being the weakest link. This is why I advocate folks looking at in-memory solutions to address their scalability issues before deciding that they need to change database products. SSDs have looked attractive to us at work but there is still a cost/benefit trade off that hasn't quite crossed the threshold given our load characteristics. There's lots of good writing about this topic at http://perspectives.mvdirona.com/2008/10/15/WhenSSDsMakeSenseInServerApplications.aspx and http://perspectives.mvdirona.com/2009/04/12/WhereSSDsDontMakeSenseInServerApplications.aspx
Monday, 29 March 2010 21:45:53 (GMT Daylight Time, UTC+01:00)
Good analogy. Well said.
Comments are closed.