Steps to fix a bug properly

Fixing a bug in a computer program isn't always easy, but even when it seems easy there are actually a lot of steps one needs to go through to make sure it's fixed properly.

First of all, you need to make sure you have the conceptual framework to understand if there is actually a bug or not. This isn't usually a problem, but there have been a few times in my career when I've started working on a completely new and unfamiliar piece of software, and I'm not sure what it's supposed to do, how it's supposed to work or whether any given piece of behaviour is a bug or not.

Secondly, you actually need to determine if the reported problem is really a bug. While we would like it if software always followed the principle of least surprise, sometimes it's unavoidable that there are things which seem like bugs at first glance but which are really by design.

Thirdly, you need to find the defect that actually caused the problem. Just fixing up the symptoms is usually not the best way, because the defect might manifest again in a different way. Even if it doesn't, there may be performance and maintainability implications in having a problem that occurs internally and is suppressed. This is often the most difficult step to do correctly.

Fourthly, you need to determine what the correct fix is. For most bugs this is pretty easy once you've found the defect - it's often just a localized typo or obvious omission. But occasionally a bug crops up for which the correct fix requires substantial rewriting or even architectural redesign. Often (especially at places like Microsoft) in such a case the correct fix will be avoided in favour of something less impactful. This isn't necessarily a criticism - just an acknowledgement that software quality sometimes must be traded off against meeting deadlines.

Fifthly, one should determine how the defect was created in the first place. This is where the programmers who just fix bugs diverge from the programmers who really improve software quality. This step is usually just a matter of spelunking in the source code history for a while, and good tools can make the difference between this being a simple operation or a search for a needle in a haystack. Unfortunately such good tools are not universal, and this use case isn't always high priority for the authors of revision control software.

Sixthly, one should determine if there were other defects with the same root cause. For example, if a particular programmer some time ago got the wrong idea about (for example) the right way to call a particular function, they might have made the same mistake in other places. Those places will also need to be fixed. This step is especially important for security bugs, because if an attacker sees a patch which fixes one defect, they can reverse engineer it to look for unfixed similar defects.

Seventhly, one should actually fix any such similar defects which appear.

The eighth and final step is to close the loop by putting a process in place which prevents other defects with the same root cause. This may or may not be worth doing, depending on the cost of that process and the expected cost of a bug in the software. When lives are at stake, such as in life support systems and space shuttle control software, this step is really critical but if you're just writing software for fun you'll probably only do it if finding and fixing those bugs is less fun than creating and following that process.

One Response to “Steps to fix a bug properly”

  1. David says:

    Replace "bug" with "clinical incident", and you've got a reasonable description of the medical audit process. Replace it with "symptom", and you've accounted for a fair proportion of medical practice. As a doctor and a geek, I'm constantly surprised by how difficult a few of my colleagues find fixing computer problems (be they hardware or software), as medical diagnosis serves so well as training for this...

Leave a Reply