When to choose SQL vs NoSQL?

I’ve worked with MySQL, PostgreSQL, MongoDB, Redis, CouchBase, Cassandra. To me, this question translates to how much do you care about your data?

Do you care if your data is inconsistent? One of the things NoSQL users secretly like about NoSQL databases is its nonrigidity. It feels liberating to stop worrying about tables, column types, schema migrations, but it’s quite funny to notice this, along aside the “let’s type all the things” trend that’s eating up the dynamic programming world.

In a way NoSQL has given up on enforcing database level data integrity. Instead, schema changes, joins and types are managed on the software level, which means more code to write and maintain, made of dependencies such as mongoose, and dense if statements. They are a simplistic interpretation of the CAP theorem: give up on C (consistency), we’ll give you A (availability) and P (partitioning).

Thinking of a schema before writing code often lands simpler and cleaner abstractions. The fact that you can dump anything into a MongoDB collection is often abused to a point your data becomes a minefield of inconsistencies, ready to blow up in your face next time you click on that fancy histogram chart.

Do you care if your data is lost? Often NoSQL databases don’t actually save the data instantly. They’ll tell you it has, when in reality it’s kept in RAM and queue on its way to be saved permanently. If any hardware damage, or power loss occur, it’s permanently lost. When using NoSQL for caching or search, and many NoSQL databases excel in this area, then losing data is likely not a big deal as it can be restored. On the contrary using NoSQL as the single source of truth, for data you can’t afford to loose like customer bank account balance, you’ll be asking for trouble.

Consistency & Durability are 2 of 4 properties that make for a reliable database management system. The 2 others are Atomicity & Isolation.

What do you mean “SQL doesn’t scale”? One major selling point for NoSQL databases is how easy they “scale”. It’s true SQL databases do not scale on their own: you can easily have N servers replicating your master database, ready for scalable reads, but if you need scalable writes, or your tables are getting so big they no longer fit on a single machine, you’ll need to handle it yourself or turn to third party software. Whereas with NoSQL databases, just plug in more servers, and it’ll take care of distributing your data across, run map-reduce in parallel and output a result.

Too many people underestimate the performance capabilities of most common SQL database running on SSD, while at the same time, overestimating their needs. They think they’ll need high frequency writes and sharding, but it’s often too early to make that assesment. Database choice comes up too early in the creation process. Add to that the hunger for modernity, the marketing task force of a MongoDB or Elastic, and suddenly you have tens of thousands of startups building the next big thing on top of a NoSQL database as single source of truth, as if scalability was more important than risk of data loss, inconsistency.

The bitcoin exchange platform Flexcoin and Poloniex have learned this the hard way.

If your data is highly relational, it won’t scale. It doesn’t quite tell the full story on the effort you’d need to prevent data loss when one of those servers goes down.

Answer to the question: When to choose SQL over NoSQL? is : always start with SQL, and keep NoSQL aside your SQL database to handle specific use cases, such as caching, full text search, queuing, granted you’re well informed of the tradeoffs:

  • is the consistency of my data important?
  • is data loss a risk I’m ready to take?
  • is my data really so big I can’t fit it on a single machine, and SQL has already failed at it?

It’s not usual for a complex application to use multiple databases, an SQL database and a NoSQL database, each for their specific strength Typical NoSQL databases will require more RAM for the same thing