Essentials of Debugging
Debugging is an essential part of software development. No matter how great of a developer, or how meticulous you code, you will run into time where your code doesn’t do what it is expected to do. Sometimes this is an error in the actual code, other times it may be an unexpected interaction that your code has with the rest of the system
When this happens you have a bug. Wikipedia defines a software bug as an error in the design or implementation (coding) of a program or application. Debugging is the the process of locating the source of the bug and then fixing the underlying problem causing the bug. While it seems straight forward, bugs can be tricky as they may not present themselves in the area of code where the problem actually exists. Many times you will notice a problem in one area of the application but the actual code causing the problem is in a totally different area or even layer of the application.
While it is important to understand what a bug is, it is just as important to understand what a bug is not. Network errors, server outages, and slow connection are not software bugs. You may want to update your code to better handle these situation, but they do not constitute bugs. It is important to know what is a bug so that you are not wasting your time trying to fix something that is not in your control.
The process for tracking a bug is broken down here, though for most people once they’ve done it a few times it starts to become automatic. You will begin to notice patterns in the bugs you fix so that you can just know that it is in a certain place in the code and that is where you look first. These patterns aren’t perfect but they often provide a starting point. Use the information here to develop that sense of where you need to look, but remember that patterns aren’t perfect and you may have to go back and follow the bug from broad to narrow to determine what part of the code is causing the problem.
Before you begin, you must be able to replicate the bug consistently.
It’s not possible to fix a bug that you can’t replicate. You’ll never know if you’ve actually fixed it or if it was even are real problem to begin with. In order to be able to track and understand what is causing an issue that issue needs to happen on a consistent basis. Inputting the same values must create the same results, even if they are wrong for it to be a consistently replicable bug.
The term consistently inconsistent refers to something that happens with enough regularity to be noticed but does not happen every time. Typically when you see something is consistently inconsistent it will be related to a race condition. Many times users, and even some less experienced QA, will blame the app or your code for issues with their own machine or connection. This typically stems from a lack of understanding how computers function. Slow internet connection isn’t a bug unless you are Ops. With these cases patience goes a long way in explaining to users.
Check the documentation to make sure this is an actual bug and not an unexpected feature.
If you are using an outside service or you didn’t write the code yourself check the documentation to make sure that you haven’t found a feature that the user didn’t know about. When you are the one building the application and QA throws something back to you check the Acceptance Criteria, if you have them, to make sure it is doing what it is supposed to be doing.
The idea is to understand what a bug is and what a bug is not. A bug occurs when the application is not functioning as it is supposed to function, that doesn’t mean how a particular user expects it to function…unless that user is the product owner. Sometimes bugs aren’t actual problems with the application or the code, but instead a problem with understanding how something is supposed to function. This could be due to misinterpretation or overly vague acceptance criteria or instructions.
Make sure there are no service outages.
Your code may be flawless with no problems or issues, but if there is a service outage nothing you can do via your code will get the application working again. There is no point in looking through code to figure out why you aren’t getting results from an API if the server housing the API is down. Start your inquiries about why a function or application is not working properly by looking to see that everything is up and running. If you have some intermittent failures check that the servers have not been down or that someone is not deploying a new release.
It is likely that your code or an area of the code you are debugging is calling an external API or service. Don’t just check that your servers are up and running but also check that your external services are not down. If you do find that your servers or ones from a service you are using are down now is a good time to look at how you are handling that in your code.
Rerun, check, and rewrite your unit tests.
Once you have verified that you have an actual bug on your hands run you unit tests to see if you can quickly find it. If they do not pass then something has changed in the code that is causing the bug and you can quickly find out where. More likely than not, your unit tests will pass. If you don’t have unit tests now may be a good time to start that conversation. Not all code is going to be testable so some may have to be rewritten meaning now is not the time to start implementing them. However, you can point out how bugs are caught quicker with unit tests in place.
Once you know where in the code your bug originates start with writing a unit test for that area that tests against the replicable bug that you are facing. This will allow you to easily test potential fixes without having to go through the whole path to get to that part of the code. After you have found the fix for your bug run your unit tests again, all of them. There is little worse than spending hours tracking and fixing a bug only for that fix to cause another bug.
Understand the difference between symptoms and cause.
It can be very easy to confuse the symptoms of a bug with the actual cause of the bug. They are what you see and so usually the first thing that comes to mind involves fixing the obvious. Unfortunately, rarely are the symptoms the real issues going on and some times they even hide the issue. A symptom answers the question ‘what is happening’ whereas the cause answers ‘why is this happening’. Treating symptoms does not actually fix the issue causing the problem it just delays solving the problem.
Treating the symptoms of a bug typically involves a “hacky” solution to get around the problem. This is not always bad, for example if you need to keep an app up and running while you diagnose the real issue then treating the symptoms can buy you valuable debugging time where your application is not down. Something that does happen more often than it should is that one bug may be masking another one. So once you resolve that bug then the other one is able to be noticed. This is especially rough when you’ve accepted the first bug for a while and are just getting around to fixing it only to find that it is a much bigger problem.
Find the layer of your app where the problem is occurring.
“Location, location, location.”
The key to debugging is to know where the bug is actually originating. To locate a bug start broad and narrow your search down as you eliminate possibilities. Eliminating an entire layer of an application will significantly improve your debug time. Standard CRUD applications have a three layer architecture with a UI, API, and a Database. Most of your web applications will fall into this structure with the possibility of a service layer or external calls to outside services and APIs. The three layer model gives a good reference point and easy way to explain eliminating layers.
Starting at one end or the other, mock up what a call to the next layer would return. For a UI calling the API this might mean mocking up the JSON object returned by the API instead of actually making the API call. If the UI works without connecting to the API then you can rule out the UI as the area where the bug is happening. If all of the layers work individually check the connections between layers. Browser dev tools are a great way to see the network calls being made from the UI to the API without digging through code. If the API has changed there may be a missed parameter or, if between API and database, an issue with a connection string or database password.
Follow the path the code is taking when the bug is occurring.
Once you’ve found the layer where the bug exists follow the path that is causing the issue all the way to the end or next layer. You are not looking for individual lines of code yet, this is still part of the narrowing process. Start by looking at the larger organization of the code. In .NET you would start by figuring out which project is causing the problem, or in Angular which component or module is the issue. Other languages have different constructs for organizing code, start broad and narrow your way down.
Once you have a general idea of where the bug is located start to narrow your search. You want to again go broad to narrow so start out with the widest net and then narrow down from class to method to line of code. A good way to narrow your search from know the general region is to add breakpoints at entry and exit of each class or method called. This way you can see what is being passed in and returned from each method and compare that to what should be passed in and out.
Once you have found the method causing the problem look for obvious issues.
Now that you’ve found the area where the bug is happening take a look at the code for obvious issues that just stand out. This is where understanding code smells comes in handy because they will be the things that stand out in an area where you are having issues. Read up on common code smells to know what to look for when you are giving the problem code a quick first glance. This may help you narrow down the issue, especially if it’s a larger method.
Hard coded values or data that was used just for testing a method should be removed before it goes out into the wild, however things can slip by and this will be one of the most obvious things you can notices when giving a quick glance at your code. Another issue that may not be obvious unless you know to look for it is if you are making a call that is pointing to the wrong place (API, DB). This might not be caught in the Network tab of your debugger, especially if it’s calling the wrong environment or it’s an issue with the API calling the wrong database or wrong service.
Forget what it “should do” and focus on what it “actually does”.
Too many developers get stuck on, “It should be doing this” and completely focus on what they expect the code to be doing while ignoring what the code is actually doing. They get it in their head that it should work a certain way and cannot conceive of a world where that is not the case. Unfortunately for those developers the code often doesn’t work the way they expect. A lot of different things can happen that will cause a loop to exit early or a function to return a wrong value.
When debugging the best approach is to go in with no expectations. If you are not able to have no expectations then go in with the expectation that nothing will work as expected. This will open you up to accept the reality of what is happening rather than living in the fantasy of what “should” be happening. Focusing on what the code is doing, even if that doesn’t make sense will help you to tack down why it is not behaving as expected. It is a matter of accepting the reality of what is happening then making a plan to change that reality…that’s the whole point of debugging.
Tricks of the Trade
Don’t let the debugger be the crutch that breaks your leg. It’s a backstop. It cannot and should not be your first line of defense, because that’s like fixing a headache with surgery. It’s extremely time-consuming to a degree that is wasteful to start fixing problems by debugging.
For bonus points, try to do your job without using your debugger for a couple of days. Can you do it? If not, you need to be addressing the reasons you can’t.