Complexity metric for identifiers

As far as I know, every code complexity metric ever divised treats all identifiers equally. After all, they're all just opaque strings to the computer, right?

But complexity metrics are not designed for the benefit of computers - they're designed for people, to try to make the code less complex and easier for people to understand (otherwise we could just measure the length of the generated binary code).

And there is certainly something less complex about a variable called "cost" than a variable called "amountPaidPerItemInDollars" for example - a program using the second should surely be considered more complex than an otherwise identical program using the first, right?

On the other hand, one doesn't necessarily want to count all the letters in an identifier to measure its complexity - that would just lead to very cryptic or meaningless 1 and 2 letter variable names.

I think the answer is to divide identifiers up into words (in case sensitive languages by starting each word with a capital letter, and in non-case-sensitive languages by separating words with underscores). Each word counts for one point and should be a real word in the English language, or defined elsewhere in the program as a sequence of other words (perhaps with a special comment that the compexlity measurer can understand). So, for example, instead of having a lot of variables containing the characters hyperTextMarkupLanguage, one would have a glossary section in one's program saying "html == hyper text markup language" and then treat "html" itself as a word.

Making up terminology is an important part of programming, and one that I think is often overlooked. Giving a decent (appropriate, one word) name to each of the important concepts in your program from the get-go, and giving each of these terms a precise meaning (even if some details of those meanings change as the program evolves) causes one to be able to think about these concepts more clearly. It also leads to easier and more consistent naming of variables and types.

2 Responses to “Complexity metric for identifiers”

  1. Derek Jones says:

    I don't think complexity is a useful metric for measuring identifiers. An identifier name serves a purpose, one of which is often to provide information intended to be useful to a reader of the code. The benefit of this information to subsequent readers is needs to be balanced against the probability that they will not make the necessary semantic association or even make an incorrect one (here length help in that more 'clues' can be provided). There are a whole ream of factors that need to be considered when performing a cost/benefit analysis of identifiers, most of which cannot be reliably measured at the moment.

    More details than you can shake a stick at in http://www.knosof.co.uk/cbook/sent792.pdf

  2. admin says:

    Oh sure, there's a lot more to choosing a good identifier than just counting the number of words in it. Complexity metrics are crude tools, but that doesn't mean that they don't have their uses (provided one is aware of their limits). We are a long way off having AI systems that can figure out ideal names for all your variables.

    I still maintain that if you have a multi-word-phrase that appears together in many identifiers in a program (or in a single identifier used in many places in a program) it's often (but not always) a worthwhile refactoring to define a single word to mean the same thing as that phrase, and use it to use this new word to simplify those identifiers. It's not very different from factoring out a repeated expression into a function.

    Thanks for the link to your book - lots of fascinating information there.

Leave a Reply