Archive for November, 2008

Steps to fix a bug properly

Wednesday, November 5th, 2008

Fixing a bug in a computer program isn't always easy, but even when it seems easy there are actually a lot of steps one needs to go through to make sure it's fixed properly.

First of all, you need to make sure you have the conceptual framework to understand if there is actually a bug or not. This isn't usually a problem, but there have been a few times in my career when I've started working on a completely new and unfamiliar piece of software, and I'm not sure what it's supposed to do, how it's supposed to work or whether any given piece of behaviour is a bug or not.

Secondly, you actually need to determine if the reported problem is really a bug. While we would like it if software always followed the principle of least surprise, sometimes it's unavoidable that there are things which seem like bugs at first glance but which are really by design.

Thirdly, you need to find the defect that actually caused the problem. Just fixing up the symptoms is usually not the best way, because the defect might manifest again in a different way. Even if it doesn't, there may be performance and maintainability implications in having a problem that occurs internally and is suppressed. This is often the most difficult step to do correctly.

Fourthly, you need to determine what the correct fix is. For most bugs this is pretty easy once you've found the defect - it's often just a localized typo or obvious omission. But occasionally a bug crops up for which the correct fix requires substantial rewriting or even architectural redesign. Often (especially at places like Microsoft) in such a case the correct fix will be avoided in favour of something less impactful. This isn't necessarily a criticism - just an acknowledgement that software quality sometimes must be traded off against meeting deadlines.

Fifthly, one should determine how the defect was created in the first place. This is where the programmers who just fix bugs diverge from the programmers who really improve software quality. This step is usually just a matter of spelunking in the source code history for a while, and good tools can make the difference between this being a simple operation or a search for a needle in a haystack. Unfortunately such good tools are not universal, and this use case isn't always high priority for the authors of revision control software.

Sixthly, one should determine if there were other defects with the same root cause. For example, if a particular programmer some time ago got the wrong idea about (for example) the right way to call a particular function, they might have made the same mistake in other places. Those places will also need to be fixed. This step is especially important for security bugs, because if an attacker sees a patch which fixes one defect, they can reverse engineer it to look for unfixed similar defects.

Seventhly, one should actually fix any such similar defects which appear.

The eighth and final step is to close the loop by putting a process in place which prevents other defects with the same root cause. This may or may not be worth doing, depending on the cost of that process and the expected cost of a bug in the software. When lives are at stake, such as in life support systems and space shuttle control software, this step is really critical but if you're just writing software for fun you'll probably only do it if finding and fixing those bugs is less fun than creating and following that process.

Neutral particles in E8 theory

Tuesday, November 4th, 2008

I didn't pay too much attention to "surfer dude" Garrett Lisi's Exceptionally Simple Theory of Everything when the paper first appeared, but since his TED talk about it was posted last month I have found myself fascinated.

Please excuse any errors in the following - I haven't studied physics at or beyond graduate level, so my understanding may be flawed in some respects.

Here's the game. You have 8 numbers (quantum numbers) - let's call them s, t, u, v, w, x, y and z for the sake of argument. Each of these numbers can be either -1, -1/2, 0, 1/2 or 1. So there are 58 = 390625 possible sets of numbers. However, there are certain rules eliminating some combinations:

  1. They must be all half-integers (-1/2, 1/2) or all integers (-1, 0, 1).
  2. They must add up to an even number.
  3. If they are integers, exactly six of them must be 0.

With these rules, there are only 240 possible sets (128 with half-integers and 112 with integers).

If you take these sets as the corners of an 8-dimensional shape, you get the pretty pictures that you might have seen associated with E8 theory.

Each of the 240 possible sets corresponds to a fundamental particle in a model that includes all the standard model particles and gravity.

  • 144 of the particles are quarks (up, down, strange, charm, top, bottom)×(spin up, spin down)×(red, green, blue)×(left-handed, right-handed)×(particle, antiparticle). These are the particles that make up the bulk of the mass of atoms (plus some variants).
  • 24 of the particles are charged leptons (electron, muon, tau)×(spin up, spin down)×(left-handed, right-handed)×(particle, antiparticle). These are the particles that determine the shapes, textures, colours and chemistry of things (plus some variants).
  • 24 of the particles are neutrinos (electron, muon, tau)×(spin up, spin down)×(left-handed, right-handed)×(particle, antiparticle). These don't have much of an effect - trillions of them pass right through your body each second.
  • 6 of the particles are gluons (orange, chartreuse, azure, aquamarine, indigo, violet [1]), responsible for the "strong force" that holds atomic nuclei together.
  • 2 of the particles are W bosons (particle, anti-particle). These cause the "weak force" which is the only way that neutrinos can interact with the other particles. They are quite heavy so never go very far.
  • 16 of the particles are frame Higgs bosons (2 types)×(spin up, spin down)×(left-handed, right-handed)×(particle, anti-particle). Interactions with these are what gives particles their mass.
  • 4 of the particles are spin connection bosons (spin up, spin down)×(left-handed, right-handed). These are responsible for gravity.

[1] These aren't the names that physicists usually use, but I prefer them.

That adds up to 220 - what about the other 20? Lisi's theory predicts some new particles which we haven't seen (because they're too heavy) but which might turn up in the Large Hadron Collider. These particles fall into two classes:

  • 2 of the particles are Pati-Salam W' bosons (particle, anti-particle). These cause a force similar to the weak force but even weaker.
  • 18 of the particles are coloured Higgs bosons (3 generations)×(red, green, blue)×(particle, anti-particle). These are like the other Higgs bosons except that they can also interact via the strong force.

Lisi's paper contains several tables which show the values of s, t, u, v, w, x, y and z for each of these particles.

Different combinations of the quantum numbers correspond to the conserved charges that physicists are more familiar with:

  • electric charge is (3v+x+y+z)/3.
  • colour charges are combinations of x, y and z.
  • weak and weaker isospins are (u+v)/2 and (v-u)/2.
  • weak hypercharge is (3(v-u)+2(x+y+z))/6.
  • two gravitational/spin charges are (s+t)/2 and (t-s)/2.
  • a charge corresponding to fermion generations (i.e. electron/muon/tau leptons, up/charm/top quarks and down/strange/bottom quarks) is w.

(Two of these are new charges predicted by E8 theory).

As well as these 240 particles, there are 8 particles with all-zero quantum numbers. The 8 neutral bosons are (I think):

  1. The photon (electromagnetic radiation: light, radio waves etc.)
  2. The Z boson (weak force)
  3. A seventh gluon (strong force)
  4. An eighth gluon (strong force)
  5. A fifth spin connection boson (gravity)
  6. A sixth spin connection boson (gravity)
  7. The Z' boson (weaker force)
  8. The w particle (generations)

As well as describing all the particles, this theory also describes their interactions. Two particles can combine to form a third (or appear as a result of the third decaying) if and only if you can add up their corresponding quantum numbers and obtain the quantum numbers of the third particle. If one of the three is a neutral boson it must be a neutral boson corresponding to a charge which is non-zero for other two particles.

If you know the quantum numbers for a particle, you can obtain the quantum numbers for the corresponding anti-particle by multiplying them all by -1. So, neutral bosons are their own anti-particles and a non-neutral particle can always interact with its own anti-particle to produce a neutral boson.

E8 theory does have some problems:

  1. It seems to imply a very large cosmological constant.
  2. Some of the higher generations of fermions have the wrong charges.

But these problems may be surmountable with more work and experimental data. Even if E8 theory is not correct in its current form, the pattern is so compelling that there is surely some nugget of truth in there somewhere that will form part of the final theory. It's not the end of physics and probably not even the beginning of the end, but it might just be the beginning of the end of the beginning.

Learning about all this stuff makes me want to create an E8 screensaver that simulates a soup of E8 particles all colliding with each other and interacting as they would at very high energies where the symmetries are unbroken. I wrote something similar many years ago when I learnt about quantum electrodynamics (a soup of electrons and photons) but this one would be much more colourful and interesting.

Non-local control structures

Monday, November 3rd, 2008

Most of the time in computer programming, causes are linked to effects by code at the "cause" point - i.e. if A should cause B then the routine that implements A should call the routine that implements B.

Sometimes, however, there is a need for effects to happen as a result of causes which don't know about those effects. The obvious example is COMEFROM but there are serious examples as well. Breakpoints and watchpoints when you're debugging is one, Database triggers are another.

A more subtle example is the humble destructor in C++ (which I have written about before) - it's effect is non-local in the sense that if you construct an automatic object in a scope, code will automatically run when control passes out of that scope. It's still a local effect in that the cause and the effect are in the same scope, but it's non-local in the sense that there is no explicit code at the point where the destructor is run.

Why writing web applications is fiddly

Sunday, November 2nd, 2008

When you want to add some functionality to a web application, there are many pieces you have to change. First you have to add some interface element (a button maybe) to a page. Then you have to add some client-side code to get this button to do something. Most of the time you'll want that action to have some server-side effect as well, so you have to make that button send a request and implement that request in the server side code. The server will generally want to send something back the client based on the result of that request, so you have to figure out what that something is and make sure the client does the right thing with it (especially fiddly if the client is AJAX-based). That response may itself contain another possible user action, so each new feature can end up creating a whole new request/response conversation.

As well as just writing this conversation, one has to consider all the things that can go wrong (both accidentally and deliberately). Requests and responses might not reach their destinations. If they do get there they might be reordered by the network along the way. Requests might be fraudulently and so on.

Complexity metrics for computer programs

Saturday, November 1st, 2008

Trying to measure the complexity of computer programs is really difficult, because just about any metric you come up with can be gamed in some way.

Cyclomatic complexity is one possible metric but this only counts loops and branches - it doesn't tell you anything about how complex the linear parts of your code are. Since expressions like "a ? b : c" can be rewritten "(!!a)*b + (!a)*c" one can also game this metric.

An often-used one is the number of lines of source code. But most languages let you arrange your source code independently of the number lines, so you can put it all on one line or put one token on each line if you're feeling particularly perverse.

Number of characters of source code is a little better but there is still scope for variation in spaces, comments and length of variable names.

We can eliminate those things the same way the compiler or interpreter does and just count the number of tokens - i.e. add up the total number of instances of identifiers, literals, keywords and operators.

This metric can still be gamed, but only at the expense of the quality of the code itself. For instance you could manually unroll loops, or sprinkle in branches that are never taken.

An interesting refinement might be to run some kind of compression algorithm over this lexed code to eliminate redundancy. Such a compression algorithm would essentially automatically refactor the code by finding and extracting common sequences. I'm not sure if it would generally be desirable to use such an algorithm to automatically refactor one's source code, but it would certainly be interesting to see its suggestions - I'm sure many programs have repeated sequences that their authors never spotted. If there are sections that should be identical but aren't because there is a bug in one of them, such a tool might even help to uncover such bugs.