Archive for May, 2007

More keywords

Thursday, May 3rd, 2007

I think the C and C++ computer languages don't have enough keywords. The current set may have been appropriate for the earliest C compilers but there are many keywords whose absence I often curse. Unfortunately it's very difficult to add new keywords to C++ because doing so tends to break a lot of existing programs. This is a curious difference between human languages and computer languages - although it seems like computer languages move much faster than human languages, the opposite is in fact true because you can add words to human languages without changing the meaning of existing documents.

Anyway, here are some of the keywords and constructs I'd like to be able to use in my programs:

Two part loops:

Often I find myself wanting a loop with the test in the middle. I usually end up writing these as an infinite loop with a "break" statement:

do {
    x;
    if (y)
        break;
    z;
} while (true);

I'd prefer to write it like this:

do {
    x;
} while (!y) {
    z;
}

[I realized sometime after writing this that I actually got this idea from here.]

until and unless:

I'd also like to be able to use "until" instead of "while" to reverse the sense of the test, like this:

do {
    x;
} until (y) {
    z;
}

Similarly, I'd like to be able to use "unless (x)" instead of "if (!x)". While this doesn't really make programs any shorter, I think being able to eliminate the "!" makes programs easier to understand and helps with the "say what you mean" principle.

Grouping if statements:

As an alternative to using the "switch" statement, I sometimes write:

if (a)
    b;
else
    if (c)
        d;
    else
        if (e)
            f;
        else
            g;

But this has an unforuntate tendency to lean to the right. I'd prefer to write:

if (a)
    b;
elseif (c)
    d;
elseif (e)
    f;
else
    g;

and leave the "else if" construct for situations like:

if (a)
    if (b)
        c;
    else
        d;
else
    if (b)
        e;
    else
        f;

Again, it doesn't make a big difference syntactically but does tend to shed more light on the programmer's intention, which is always a good thing. Similarly there should be an "elseunless" keyword meaning elseif with the sense reversed.

[These ideas came from here, though his "unless" is a bit different to mine, and I've never missed "when".]

Done clauses:

Sometimes I'm using a loop for a search and want to do something in particular if the thing I was searching for was not found. Normally I'd have to write something like this:

bool found = false;
do {
    Item current = get_next_item();
    if (current == target) {
        process(current);
        found=true;
        break;
    }
} while (current != last);
if (!found)
    fail(target);

I'd prefer it if loops had something analogous to an "else" clause which is called only if the loop condition fails. I call this the "done" clause. This would make the example look like this:

do {
    Item current = get_next_item();
    if (current == target) {
        process(current);
        break;
    }
} while (current != last); done
    fail(target);

Much neater.

Multiple break

I'd like to break out of multiple loops at once. Suppose I'm searching a 2D array:

for (int x = 0; x < 10; ++x) {
    int y = 0;
    for (; y < 10; ++y)
        if (array[y][x] == target) {
            foobar(x, y);
            break;
        }
    if (y < 10)
        break;
}

I'd much rather write it as:

for (int x = 0; x < 10; ++x)
    for (int y = 0; y < 10; ++y)
        if (array[y][x] == target) {
            foobar(x, y);
            break break;
        }

The syntax "break continue" could be employed to mean "break out of the innermost loop and then do a 'continue' on the next loop out".

[I also posted this idea on comp.std.c++ here but it was shot down. I still think it's a good idea, though. That entire thread is a gold mine of good (and some not so good) programming language ideas.]

Exponentiation operator:

I'd like to change the meaning of the ^ operator to be exponentiation and have binary ~ mean XOR (giving a nice symmetry between unary/binary - (negative/minus) and unary/binary ~ (NOT/XOR)).

bool argument to if and while

It is surprising how many of the criticisms levelled at C and C++ can be traced back to a simple misfeature - allowing conditional statements ("if" and "while") to have arguments of any integral type. These statements should only accept bool values, thus sidestepping the whole "confusing = and ==" business altogether (well, at least until you want to compare two bool variables). The usual boolean operators !, &&, ||, ==, !=, <, >, <= and >= should all return bool values and !, && and || should also require their arguments to be bool. This would break a lot of programs, but they would be easy to fix (just sprinkle "!= 0" everywhere you see a "bool argument required" error.

Assignment not an operator

I think that having the assignment operator return a value is a mistake. I try to avoid using constructs like "if (a = b)" and "a = b = c;" in my code because they are overly terse and make refactoring more difficult. I would also eliminate the post increment and decrement operators (no more "c++"!) and make all assignments, increments and decrements statements instead of expressions. I'm not sure I would go so far as to separate functions into pure functions returning values which can be used in expressions and procedures which have no return values but can have side effects, but it is tempting.

Compile-time checked exceptions

Wednesday, May 2nd, 2007

C++ offers an ability to specify which exceptions might be thrown by a given function or method. Unfortunately because of its terrible implementation (the exception specification is only checked at runtime) it is rarely used. I (and many others) think that this would have been much more useful as a compile time feature.

Here is how I think it could work. Every function has an exception specification which can be explicit or implicit. If it is implicit it is calculated by creating the union of the exception specifications of all called functions and methods and the set of exceptions the function actually throws itself, and removing from that set any of the exceptions that that function actually catches. That way, exception specifications are optional but are used if they are there.

Then, certain functions (like the main() function, functions callable from C code, destructors and functions implemented in C code) can have an implicit exception specification of the empty set, eliminating an entire class of bugs.

Negative-overhead exceptions

Tuesday, May 1st, 2007

There are two schools of thought about how to handle errors in computer programs. By "errors" here I mean things that are out of the programmer's control, like "out of memory" or "file not found" rather than mistakes in the actual program (which is a whole other topic). Ideally we want programs which don't just crash when they hit such unexpected circumstances but rather tell the user exactly what the problem is and give them the opportunity to fix it and try again or do something different instead.

The first school of thought is that functions which fail should return some kind of "status code" (which may be "success" or one of several other values indicating different types of error). This is how most of the functions comprising the Windows API signal failure.

The problems with status codes are:

  1. Whenever you call a function you need to check it's status code or risk missing an error (which usually makes things worse). It's tedious and error prone (mistake prone) to write the code to do this and sometimes programmers forget.
  2. The information about an error you can return is quite limited - you can say "file not found" but you can't say which file wasn't found or what the program was trying to do at the time (you have to determine this from context).
  3. If you're using the return value of your function for a status code you can't use it for something more natural. For example, if you're writing a square root function that needs to return an error code when given a input that is negative, it can't also return the answer in a natural way.

The second school of thought is that failures should be indicated by "throwing an exception", which is technically rather more complicated. This involves looking at the function that called the failing function, and the function that called that one and so up the stack until you find a function that says "hey, I know about that exception - pass control to me and I'll deal with it". This eliminates the need for all the functions in between to know about the possible failure at all (other than cleaning up their resources).

Exceptions are new and spiffy and are generally considered to be the "right way" of doing things in new programs. Unfortunately, exceptions also have drawbacks:

  1. Your program is less "explicit" about what it actually does - certain controls paths are "hidden away" in the exception mechanism. Proponents of exceptions argue that this is a good thing.
  2. Certain programming styles cannot be used in the presence for exceptions. For example "do X, then do Y, then do Z, then do W if Y failed." Proponents of exceptions argue that these styles are mistake prone (it's all to easy to forget to do Z) and should be avoided anyway.
  3. While many programming languages in widespread use support exceptions, not everything does. In particular, certain "glue" for interfacing between different bits of code written in different languages generally doesn't support exceptions because they need to be compatible with languages which don't support exceptions (the old "lowest common denominator" problem).

Another criticism often made of exceptions is that they are slow. This may have been true in the past, but in practice it turns out that exceptions are no slower than status codes in the success case (they are slower in the error case, but that doesn't matter too much because errors should be pretty rare).

I think that it is possible to make programs that use exceptions even faster than programs that use status codes. With the right set of tables, the error case code can be completely separated from the normal case code. No code that deals with or tests for errors even needs to be loaded into memory until an exception is actually thrown. Then control passes to a "throw" function which loads the tables and error handlers into from disk and examines the stack to determine which functions the exception passes through and which function will actually handle the exception. The code is faster than the "status code" version because it doesn't have to have all those "if failed" tests sprinkled throughout.

I haven't tried implementing it, but as far as I can tell there are very few places where this scheme would have any overhead at all. One is this situation - State0::RunState() and State1::RunState() can't be folded together with my scheme because they have different stack unwinding information.

The other limitation is that the stack layout and set of constructed objects has to be completely determined by the instruction pointer in any given function. This is usually the case but does mean that functions can't allocate dynamically sized arrays on the stack. These don't tend to be used in practice because they have the serious problem that it is very easy to overflow the stack. If you do know in advance a maximum possible size for this array you may as well just allocate the maximum size.