Archive for the ‘computer’ Category

Steps to fix a bug properly

Wednesday, November 5th, 2008

Fixing a bug in a computer program isn’t always easy, but even when it seems easy there are actually a lot of steps one needs to go through to make sure it’s fixed properly.

First of all, you need to make sure you have the conceptual framework to understand if there is actually a bug or not. This isn’t usually a problem, but there have been a few times in my career when I’ve started working on a completely new and unfamiliar piece of software, and I’m not sure what it’s supposed to do, how it’s supposed to work or whether any given piece of behaviour is a bug or not.

Secondly, you actually need to determine if the reported problem is really a bug. While we would like it if software always followed the principle of least surprise, sometimes it’s unavoidable that there are things which seem like bugs at first glance but which are really by design.

Thirdly, you need to find the defect that actually caused the problem. Just fixing up the symptoms is usually not the best way, because the defect might manifest again in a different way. Even if it doesn’t, there may be performance and maintainability implications in having a problem that occurs internally and is suppressed. This is often the most difficult step to do correctly.

Fourthly, you need to determine what the correct fix is. For most bugs this is pretty easy once you’ve found the defect - it’s often just a localized typo or obvious omission. But occasionally a bug crops up for which the correct fix requires substantial rewriting or even architectural redesign. Often (especially at places like Microsoft) in such a case the correct fix will be avoided in favour of something less impactful. This isn’t necessarily a criticism - just an acknowledgement that software quality sometimes must be traded off against meeting deadlines.

Fifthly, one should determine how the defect was created in the first place. This is where the programmers who just fix bugs diverge from the programmers who really improve software quality. This step is usually just a matter of spelunking in the source code history for a while, and good tools can make the difference between this being a simple operation or a search for a needle in a haystack. Unfortunately such good tools are not universal, and this use case isn’t always high priority for the authors of revision control software.

Sixthly, one should determine if there were other defects with the same root cause. For example, if a particular programmer some time ago got the wrong idea about (for example) the right way to call a particular function, they might have made the same mistake in other places. Those places will also need to be fixed. This step is especially important for security bugs, because if an attacker sees a patch which fixes one defect, they can reverse engineer it to look for unfixed similar defects.

Seventhly, one should actually fix any such similar defects which appear.

The eighth and final step is to close the loop by putting a process in place which prevents other defects with the same root cause. This may or may not be worth doing, depending on the cost of that process and the expected cost of a bug in the software. When lives are at stake, such as in life support systems and space shuttle control software, this step is really critical but if you’re just writing software for fun you’ll probably only do it if finding and fixing those bugs is less fun than creating and following that process.

Non-local control structures

Monday, November 3rd, 2008

Most of the time in computer programming, causes are linked to effects by code at the “cause” point - i.e. if A should cause B then the routine that implements A should call the routine that implements B.

Sometimes, however, there is a need for effects to happen as a result of causes which don’t know about those effects. The obvious example is COMEFROM but there are serious examples as well. Breakpoints and watchpoints when you’re debugging is one, Database triggers are another.

A more subtle example is the humble destructor in C++ (which I have written about before) - it’s effect is non-local in the sense that if you construct an automatic object in a scope, code will automatically run when control passes out of that scope. It’s still a local effect in that the cause and the effect are in the same scope, but it’s non-local in the sense that there is no explicit code at the point where the destructor is run.

Why writing web applications is fiddly

Sunday, November 2nd, 2008

When you want to add some functionality to a web application, there are many pieces you have to change. First you have to add some interface element (a button maybe) to a page. Then you have to add some client-side code to get this button to do something. Most of the time you’ll want that action to have some server-side effect as well, so you have to make that button send a request and implement that request in the server side code. The server will generally want to send something back the client based on the result of that request, so you have to figure out what that something is and make sure the client does the right thing with it (especially fiddly if the client is AJAX-based). That response may itself contain another possible user action, so each new feature can end up creating a whole new request/response conversation.

As well as just writing this conversation, one has to consider all the things that can go wrong (both accidentally and deliberately). Requests and responses might not reach their destinations. If they do get there they might be reordered by the network along the way. Requests might be fraudulently and so on.

Complexity metrics for computer programs

Saturday, November 1st, 2008

Trying to measure the complexity of computer programs is really difficult, because just about any metric you come up with can be gamed in some way.

Cyclomatic complexity is one possible metric but this only counts loops and branches - it doesn’t tell you anything about how complex the linear parts of your code are. Since expressions like “a ? b : c” can be rewritten “(!!a)*b + (!a)*c” one can also game this metric.

An often-used one is the number of lines of source code. But most languages let you arrange your source code independently of the number lines, so you can put it all on one line or put one token on each line if you’re feeling particularly perverse.

Number of characters of source code is a little better but there is still scope for variation in spaces, comments and length of variable names.

We can eliminate those things the same way the compiler or interpreter does and just count the number of tokens - i.e. add up the total number of instances of identifiers, literals, keywords and operators.

This metric can still be gamed, but only at the expense of the quality of the code itself. For instance you could manually unroll loops, or sprinkle in branches that are never taken.

An interesting refinement might be to run some kind of compression algorithm over this lexed code to eliminate redundancy. Such a compression algorithm would essentially automatically refactor the code by finding and extracting common sequences. I’m not sure if it would generally be desirable to use such an algorithm to automatically refactor one’s source code, but it would certainly be interesting to see its suggestions - I’m sure many programs have repeated sequences that their authors never spotted. If there are sections that should be identical but aren’t because there is a bug in one of them, such a tool might even help to uncover such bugs.

It’s hard to buy a wifi card that works with Linux

Friday, October 31st, 2008

I recently reorganized my home wireless network a bit, and the AP that I had been using connected to my Linux box stopped working. I wanted to replace it with an internal card but it’s annoyingly difficult to find a wifi card that works well with Linux.

Various chipsets are supported with Free drivers but the trouble is that you can’t buy a card by chipset - you have to pick a card, research it to try to figure out what the chipset is and then see if it is supported. Even then there’s no guarantee because many manufacturers make several completely different cards with different chipsets and give them the same model number (which kind of defeats the point of a model number if you ask me). And the online shopping places don’t tell you the revision number of the card you’re buying.

Eventually I gave up trying to find one with Free drivers and settled on this one which people seemed to be having success with. Indeed Ubuntu 8.04 recognized it straight away and connected to my network. Still, it’s annoying that it’s so difficult to buy a card for which Free drivers exist.

A stack of refactorings

Wednesday, October 29th, 2008

I’m not sure if this is a bad habit of mine or if other programmers do this too. Sometimes after having partially written a program I’ll decide I need to make some change which touches most of the code. So I’ll start at the top of the program and work my way down, making that change whereever I see it needed. Partway through doing this, however, I’ll notice some other similarly impactful change I want to make. Rather than adding the second refactoring to my TODO list and continuing with the first, I’ll go right back up to the top of the program and work my way down again, this time making changes whereever I see either of the refactorings. I reckon I’ve had about as many as 5 refactorings going on at once sometimes (depending on how you count them - sometimes an earlier refactoring might supercede a later one).

Keeping all these refactorings in my head at once isn’t as big a problem as it might sound, since looking at the code will jog my memory about what they are once I come across a site that needs to be changed. And all this reading of the code uncovers lots of bugs.

The downside is that I end up reading (and therefore fixing) the code at the top of my program much more than the code further down.

Language optimized for refactoring

Friday, October 24th, 2008

One property of computer languages that is important but often seems to be overlooked is how easy it is to refactor programs written in them.

The one example that springs immediately to mind is renaming a class. In C++ this is a bit more difficult than in many languages because the constructors and destructors have the same name as the class, so you have to go and change all of those too. PHP wins here for calling them __construct and __destruct respectively.

If you are in the school of thought that has C++ method definitions in a separate file (e.g. .cpp) to class declarations (.h), you have to go and change things in two different files (even if you’re just adding a method that nobody calls yet). If that class implements an COM interface defined by a .idl file then there’s yet another thing you need to change.

Python’s syntactically-significant whitespace is another winner here because if (for example) you put another statement in an “if” clause that currently only has one statement, you don’t have to add braces.

I’m sure there are many other, deeper examples.

Once you go OOP, there’s no going back

Thursday, October 23rd, 2008

Object Oriented Programming is at least as much a state of mind as a set of programming language facilities. When I learnt C++ it was a bit difficult to get used to writing object-oriented programs but now that I’ve been doing it for many years I can’t get used to thinking about my programs any other way.

I was writing some PHP code recently and (not knowing about PHP classes) started writing it in a procedural fashion. After a while I noticed that many of the functions I was writing started to fall naturally into classes (with a first parameter that gave the function context). So it was only natural to re-write it in object-oriented style once I figured out how to do so.

In the process of doing so, I found lots of bugs in my original code (which I had thought was rather nifty). Many functions became much simpler. I also found it was much easier to do various optimizations that would have been very difficult to do without classes (such as minimizing the number of database queries). My code file did become somewhat bigger, but I attribute this to the extra indentation most lines have, and the fact that PHP requires you to write “$this->” everywhere.

I also tried writing a C program (from scratch) for the first time in a very long time a while ago. I found myself using an object-oriented style and implementing vtables as structs.

Javascript exchange site

Wednesday, October 22nd, 2008

Back in the 80s, most home computers used to boot into a dialect of BASIC. This made it very obvious how to start to learn to program - just type things in and try things out to see what works.

Modern computers are much richer in many ways but do have the disadvantage that it’s less obvious how to start programming. One could even be forgiven for assuming that the typical off-the-shelf Windows Vista machine doesn’t even come with a built in programming language. Actually there are 3 (at least) - the windows command shell language, VBScript and JScript (JavaScript). The windows command shell language (the descendent of the MS-DOS batch language) is ugly, badly documented and almost impossible to debug so lets skip that one. Between VBScript and JScript, the latter is better to learn because it’s cross-platform and VBScript is Windows only. There are two ways (at least) to run JScript in Windows - one is through the Windows Script Host (wscript.exe or cscript.exe) and the other is through the web browser. The latter is a graphically rich, interactive and familiar environment so I think that’s the way to go.

JavaScript is a much nicer language than the 8-bit BASIC dialects from the 80s but it’s still not very discoverable. The tutorials and reference guides are all out there but you have to have a text editor open in one window, one browser window for your program and at least one other browser window as containing your reading material. I think that this is a problem that could be solved with a website.

I’d like to see a site which does for Javascript what computers booting into a BASIC interpreter did for BASIC - a one-stop shop for (at least beginner-level) Javascript development. It would allow you to type Javascript code right into a web page and see its output right there on the page immediately (perhaps with separate divs within the page for the Javascript code, the program’s output and tutorials).

The code editor might have syntax highlighting, intellisense, a built-in debugger - whatever can be provided to make programs as easy as possible to develop.

Once you’ve written some code you can save it on the website and access it from anywhere. You can also share it with friends. If one person defines an object someone else can use that object in their programs. In this way, a rich ecosystem of scripts can develop.

Another possible refinement would be for the web server itself to provide some abilities that scripts can use. Perhaps just storing a small amount of data per script per user so that scripts can do some persistent stuff, or perhaps allowing some server-side JavaScript as well as the client-side scripts, to enable the writing of rich AJAX web applications.

PHP could be more secure

Monday, October 20th, 2008

Given that PHP is designed to be used to write applications that run on web servers, you’d think it would have been designed rather more with security in mind.

In particular, PHP’s dynamic typing seems to be a source of security weaknesses. Dynamic typing has advantages in rapid development and code malleability but is not particularly helpful for writing secure code - security is greatly helped by being able to restrict each variable to a specific set of values and having the compiler enforce this.

Similarly with the SQL API - because the interface is all just strings instead of strongly typed objects, SQL injection vulnerabilities becomes all to easy to write.

Variable scope is another one - because there are no variable declarations it’s not obvious where variables are introduced, so one could be using variables declared earlier without realizing it (this is why register_globals changed from default-on, to default-off, to deprecated to removed).

Then there are ill-concieved features like magic quotes, and missing features like cryptographically secure random number generation.

A well-designed language for web development would be secure by default when doing the most obvious thing - one shouldn’t have to go out of one’s way to learn what all the security pitfalls are and have to write to explicitly address each of them (and update your code when the next such pitfall is discovered).