Complexity metrics for computer programs

Trying to measure the complexity of computer programs is really difficult, because just about any metric you come up with can be gamed in some way.

Cyclomatic complexity is one possible metric but this only counts loops and branches - it doesn't tell you anything about how complex the linear parts of your code are. Since expressions like "a ? b : c" can be rewritten "(!!a)*b + (!a)*c" one can also game this metric.

An often-used one is the number of lines of source code. But most languages let you arrange your source code independently of the number lines, so you can put it all on one line or put one token on each line if you're feeling particularly perverse.

Number of characters of source code is a little better but there is still scope for variation in spaces, comments and length of variable names.

We can eliminate those things the same way the compiler or interpreter does and just count the number of tokens - i.e. add up the total number of instances of identifiers, literals, keywords and operators.

This metric can still be gamed, but only at the expense of the quality of the code itself. For instance you could manually unroll loops, or sprinkle in branches that are never taken.

An interesting refinement might be to run some kind of compression algorithm over this lexed code to eliminate redundancy. Such a compression algorithm would essentially automatically refactor the code by finding and extracting common sequences. I'm not sure if it would generally be desirable to use such an algorithm to automatically refactor one's source code, but it would certainly be interesting to see its suggestions - I'm sure many programs have repeated sequences that their authors never spotted. If there are sections that should be identical but aren't because there is a bug in one of them, such a tool might even help to uncover such bugs.

One Response to “Complexity metrics for computer programs”

  1. [...] text editor that shows (along with information like the line and column number the cursor is on) a metric for how complex the code in the buffer is. Then refactoring would give immediate feedback – a [...]

Leave a Reply