Toxic Mix? OOP and Databases
Podcast: Play in new window | Download (48.7MB) | Embed
Subscribe: Apple Podcasts | Spotify | Email | RSS | More
Your database is often the most important part of your application in terms of actual value to the business. The value of the database will only increase over time, as long as it is adequately maintained. Your database administrator is in charge of this maintenance, and you’ll often find that the constraints under which they are operating are very different than yours. In fact, if you don’t play your cards right, you’ll often find that the database administrator quickly gets in the way of all that object oriented programming goodness that you enjoy.
“What’s the deal with Object Oriented Programming and databases?”
Object Oriented Programming (OOP) vs Database Design has been called “The Vietnam War of Computer Science”. There are two main problems here. Calls to the database are typically over a network, whereas OOP code is typically local. OOP is heavily centered around code reuse based on inheritance, whereas database operations are structured around projections of data.
“The DBAs are the Viet Cong and we [Complete Developer Podcast] are Robin William’s character.”
An object relational mapper (ORM) maps tables and other database objects to objects in your code. It enables you to programmatically interact with the objects, rather than dealing with SQL. Typically, an ORM integrates a unit of work pattern that allows you to modify a bunch of objects and then push the changes back to the database as a unit. They tend to protect against SQL injection to varying degrees.
It’s probably better to stop thinking of your database as a data storage system and to think of it instead as a micro-service that happens to be written in SQL. By thinking of your DBA as another developer who is writing a service that you are consuming, you’ll make better decisions about how to use OOP principles in your app. While this is a subtle shift, it’s one that you need to make as a developer to make it easier to work with your DBA.
17:25 Most ORMs are built to enable CRUD against tables in the database.
Metadata about tables is usually the easiest thing to get. It also matches the way that many developers think about objects, as bags of state optimized for implementing business logic. Tables are also basically bags of state, but are optimized for storage.
“When you wrote it and you look at the source control and remember…”
The difference is subtle, but typically means that the ORM directly maps a table into an object. This usually means that the ORM has a bias towards direct table access, which often hamstrings the DBA’s efforts to optimize. To overcome this, don’t map types directly to tables. Instead, map them against projections of data (this can be views, or pre-canned sql)
21:38 ORMs often obscure the fact that calls happen across a network.
Generally when using an ORM, you’ll make a bunch of changes to an object or object graph and then commit it using the ORM. As a result, the objects in the ORM have to keep track of all changes that have occurred as they happen and then commit them to the database in a batch.
“You end up with ‘I need this value’ but it’s not blatantly apparent that I’m making a network call to get that value.”
However, certain operations like adding a new record and needing its primary key for something will require a round-trip to the database. If you aren’t careful you can cause additional network traffic without intending to do so. The source of the extra traffic can often be very difficult to pin down in your code. It may not be immediately obvious that a particular operation occurs across the network. This is best dealt with by limiting the amount of data manipulation you do in your code and passing it off to the database in one chunk instead.
“You enforce the business rules upstream.”
26:15 ORMs also have a hard time with foreign keys.
“I set Lazy Loading off as the default when I’m using an ORM.”
While a database table may have an integer id and a foreign key to another table, when dealing with ORMS, you typically want to have an object property instead. This can lead to accidental SELECT N+1 problems, where you are inadvertently making additional calls to the database due to accessing a property.
Because the ORM is working with a copy of the data from the database, it’s possible for the ORM to fail to enforce constraints (like foreign keys) until you attempt to commit the transaction. Depending on how the ORM deals with transactions (and how you deal with the ORM) this could lead to partial commits and corrupted data. When dealing with tables that have foreign keys, you need to either treat the foreign keys as integer properties or as object properties, not both. You’ll also need to be careful to load things before you need them, rather than relying on things like lazy loading to get them for you.
30:05 Because ORMs try to use object oriented principles to improve access to the database, they typically aren’t written with a single database in mind.
Databases vary widely, both in the best way to work with them and the types of data they can store. As a result, most ORMs are designed for the lowest common denominator, which may have performance implications for you.
“It’s the lowest common denominator, it’s going to do the basic work you need done.”
For instance, due to the way that your ORM works, you may not be able to do things using HeirarchyIDs and their assorted functions (SQL Server) in queries, instead being forced to rely upon self-referencing tables, which can significantly impact performance. You can probably get by with the lowest common denominator for most database operations, but you need to make sure that if tuning for a particular database is required that your ORM will support that decision by allowing you to easily wrap doing the work in the database instead.
33:58 ORMs typically make querying much easier.
Easy queries are nice, but they can lead to problems when the fields being queried are not indexed appropriately. There is typically no way to tell at the ORM level whether a particular field SHOULD be allowed to be used in a search.
“I get pulled into the DBA role probably more than I should.”
When combined with the tendency to directly work with a table, this practice can severely impact database performance if a table is being used for performance-sensitive operations. To make this work better, you’ll need to actually communicate with your DBA, or learn how to determine which fields are properly indexed yourself.
“Show me the SQL!”
It’s also best if you can get the actual SQL being executed, as many database engines have tools that can let you see what the code does when executed. You also need to be careful to avoid allowing certain kinds of searches (e.g. string contains)
41:26 ORMS are typically used to get up-to-the-minute data from the database.
This is not necessarily a characteristic of ORMs, but rather a characteristic of how the tutorials for most of them are constructed and how people use them early in a project. Up to the minute (live) database access isn’t necessarily needed for all apps.
“Our content hosting has done this to us.”
To mitigate this, you might do well to have multiple databases you hit, with different staleness levels. For instance, there are some setups with different connection strings for Live, 1 hour old, and 1 day old. This allows you to pick the stalest data possible, rather than hitting tables that are in heavy use.
Air Quality Egg
This little egg is an open source project that has been built to measure the air quality within a community. It monitors pollution levels that is collects and shares with other devices. There’s a mobile and web interface for viewing the data it collects. It measures levels of carbon monoxide, carbon dioxide, sulfur dioxide, nitrogen dioxide, and other volatile organic compounds. The eggs are wifi enabled so you can hook it up to your wifi and it will collect and share data with other eggs in your area. You can create a network of your own eggs or interact with others in the area to see data trends.
Tricks of the Trade
Make a new type for a primary key for a particular table, rather than using an integer. You can, in many OO languages, create types that wrap primitive types to use for this sort of thing. Doing so will let you use the type system to protect you against certain types of errors, rather than finding them at run time (or worse, after it has been running for a while). OOP can hurt you with your database, but don’t forget that the full power of OOP can still be leveraged to help you out considerably.