Document VS Relational Databases
Podcast: Play in new window | Download (58.4MB) | Embed
Subscribe: Apple Podcasts | Spotify | Email | RSS | More
Document databases are becoming increasingly common. Not only do they offer a lot of advantages over relational databases in certain contexts, but they also greatly reduce the friction of working with data when using object-oriented languages. Not only do these advantages often make it easier to work with your data, but they can make it easier to rapidly iterate on your design as you develop your software.
However, much like agile principles, all this flexibility can get you into trouble if you aren’t disciplined. Simply being able to do more things doesn’t make all those things prudent, scalable, or sustainable.
The structure of your data matters. This is still true when you are using NoSql data models. However, while many developers know how to build decent relational databases, many of us are at a loss the first time we encounter a database with a non-relational structure.
Document databases are becoming increasingly common and important parts of modern software architecture. While there are a few spots where they are tricky to use well, they can often supplement or even replace older relational models. If nothing else, the simplicity of working with them (and the avoidance of the object-relational impedence mismatch) is often worth the trouble.
Episode Breakdown
Major concerns for relational databases.
No duplication of data. Flat structures connected by joins. Focus is on entities and their relationships. Queries and indexes tend to be built after the data model is worked out.
How document databases do it.
Data structure is highly based around how it will be retrieved and structured based on how it will be read. While relationships between documents are supported to varying degrees,they aren’t treated with the importance that they are in a relational model. The data that is generally read together is generally stored together. Duplicate data is less of a concern than it would be in a relational model.
One to one relationships
In a relational database, you might model such relationships in a single table, or in two tables with a relationship between them. In general, in a document database, you would simply put it all in a single document, with nullable fields as required. This is a simple case, but gets rid of a join. Updating either record requires an update of the document.
One to (not too) many relationships.
In a relational database, this is going to be expressed as a parent table, with a child table that has a foreign key back to the parent. In the relational model, you’d retrieve the parent and children in one shot by either returning two recordsets, or by using a join to flatten the relationship. In a document database, if the children are not relevant without the parents, you’d simply have an array in the document being stored. Retrieval of this dataset would not require a join and would not require parsing, but updating a child record would require you to update the parent document.
One to (a great) many relationships.
In a relational database, this would again be expressed as two tables joined with a foreign key (and hopefully with some appropriate indices). In a document database, this typically would not all be stored in the manner previously outlined, because any retrieval of the parent brings back all the child records. Instead, you would typically store the child records as their own documents, with a reference back to the parent. That way you could grab sane recordsets without getting humongous amounts of data. This can be interesting as more child records pile up, but you could also use a hybrid approach.
Many to Many relationships
In a relational database model, this would be expressed by having two entity tables, with a join table between them. Multiple joins would be used for retrieval. This gets interesting in document databases, because the model is no longer based around entities, but around how they are read. It’s entirely possible that you might express such a relationship with two separate documents, one for each end of the relationship. Each would have its related keys stored as an array under the aggregate root. This can be interesting in terms of keeping both documents up to date.
Hierarchies
In relational databases, this will either be modeled using some sort of hierarchyid type (and related functions) or simply be a long chain of foreign keys. The latter is extremely annoying, btw., as you can end up with an unknown number of joins. In a document database, you have many options. For instance, if hierarchies don’t change frequently, you could simply store the whole tree in a single document. You might also consider storing the data for each node in its own document, and having an array with references to the parents back to the beginning, and a nested set of arrays listing the children. The approaches can be hybridized considerably based on your needs. The downside of the document db approach here is that writes may require numerous updates to be complete. However, reads, which are typically the bulk of data access, only require a single operation.
Key value pairs
In a relational database this is often done with a three column table, with one column for the related entity, a column for the key, and a column for the value. Type safety here is garbage, as in index performance on the value. In a document database, this would simply be stored in a dictionary under the entity document, provided that the data was small in size. If there was a larger amount of data, or if some parts aren’t frequently used, this data could be stored in a different document. In addition to avoiding the type safety and performance issues of the relational model, it’s also easier to store complex types in this structure, without having to serialize json or do odd things with your keys to indicate nesting.
Sparse tables
Sometimes entities have a lot of fields, few of which are actually filled in. This can be for a variety of reasons. In a relational model, you can do several approaches to this one. Key value pairs, nullable columns, or even multiple other tables for sets of values could be used. All of these approaches have problems – you just pick the problems you can stand to have. In a document database, these can just be nullable fields on the document.
Final thoughts
Storing things in a document database is somewhat similar to the way a lot of really old apps stored things in the filesystem, with rather similar structural patterns, but better querying and concurrency. A document database doesn’t have to be your entire data storage. It can be hugely effective, even if you just use it to cache the results of complex queries from the backend of a system. Concurrency problems are often alleviated by document databases, but in cases where they aren’t, event streaming can also help.
Book Club
The 17 Essential Qualities of a Team Player
John C. Maxwell
The final quality of a team player is Tenacious. “Never, never, never quit.” Maxwell starts his last chapter with a personal story about a trip to London with his wife and some friends. While he wanted to see the home of John Wesley and another of his friends wanted to visit sites related to C.S. Lewis one friend had his heart set on getting a picture at Abbey Road like on the Beatles album. {Yes, I’ve done it too…barefoot and all.} Unfortunately when they went to get the picture taken the road was closed for construction. Being their last day in London this was the only chance they had so they persisted in talking to the construction crew until they eventually allowed them to take the picture. He goes on to talk about how tenacity is crucial to success giving three ways tenacity is expressed. First it means giving all that you have, but that doesn’t mean giving more than you have to give. Next it is working with determination instead of relying on luck or fate. Finally, tenacity is not quitting until the job is done no matter how tired you become. He closes with three ways to improve your tenacity: work hard and smart, stand for something and act with integrity toward it, and make work into a competitive game.
Tricks of the Trade
Be willing to try new things and shift your thinking to fit those new things.