Archive for the ‘language’ Category

The ALFE type system

Friday, October 12th, 2012

I want to write a series of posts about the rich type system I want to implement for ALFE. This post will serve as an index to them.

I recently tried doing a bit of programming in Haskell, and although I didn't produce anything worth showing off, I did get a bit of a feel for the language and how it works. One of my favorite things about Haskell is the rich type system, and I want to use some of its ideas in ALFE. Some of the ALFE types below are inspired by Haskell types.

Given types Foo, Bar and Baz I would like ALFE to have the following types:

  • Foo* - a pointer type.
  • Foo(Bar) (aka Foo -> Bar) - a function type.
  • Foo | Bar - a "sum" type.
  • Foo? - a "Maybe" type.
  • (Foo, Bar) - a "product" type.
  • [Foo] - a "sequence" type.
  • Foo[2] - a fixed length array type.
  • Bottom - the "bottom" type.
  • Variant - the "variant" type.
  • Label - the "label" type.
  • Complex<Foo> - a complex number type.
  • Rational<Foo> - a rational type.
  • String - the string type (usually represented in memory as UTF-8 in some form).
  • CodePoint - a type capable of holding any Unicode code point.
  • Fixed<Foo, Bar> - a fixed point fractional type.
  • Boolean - the boolean type.
  • Int1, Int2, Int4, Int8, Byte, Int16, Int32, Int64, Int128, Int256, Int512, UInt1, UInt2, UInt4, UInt8, UByte, UInt16, UInt32, UInt64, UInt128, UInt256, UInt512, NInt1, NInt2, NInt4, NInt8, NByte NInt16, NInt32, NInt64, NInt128, NInt256, NInt512 - fixed-width integer types.
  • Word, Int, HalfWord, QuarterWord, DoubleWord, QuadWord, UWord, UInt, UHalfWord, UQuarterWord, UDoubleWord, UQuadWord, NWord, NInt, NHalfWord, NQuarterWord, NDoubleWord, NQuadWord, FastInt1, FastInt2, FastInt4, FastInt8, FastByte, FastInt16, FastInt32, FastInt64, FastInt128, FastInt256, FastInt512, FastUInt1, FastUInt2, FastUInt4, FastUInt8, FastUByte, FastUInt16, FastUInt32, FastUInt64, FastUInt128, FastUInt256, FastUInt512, FastNInt1, FastNInt2, FastNInt4, FastNInt8, FastNByte FastNInt16, FastNInt32, FastNInt64, FastNInt128, FastNInt256, FastNInt512 - machine-dependent integer types.
  • WordString, Integer, Unsigned - arbitrary-precision integer types.
  • Float16, Float32, Float64, Float128 - fixed-length floating-point types.
  • Float - machine-dependent floating point types.
  • Floating - arbitrary-precision floating point type.
  • Concrete (possibly also specializations like Length, Time, Area, MagneticFluxDensity etc.)

There's still a lot about this that isn't finalized. For example, I might use some templates to avoid having all those fixed-width and machine-dependent types.

Recompiling only the code that has changed

Tuesday, September 18th, 2012

Designing an entirely new computer language is a ridiculously ambitious thing to attempt as a personal side project, but even so it apparently isn't ambitious enough on its own for me - oh no, I want to write a complete compiler for it as well.

Why would I go to all the trouble of doing that rather than just writing a GCC front end, starting with LLVM or even writing it as a translator to another language?

Well, the short answer is that I want to explore some parts of the design space of both compiler design and languages that I feel have so far gone under-explored. In other words, play around and have some fun.

I do have some specific ideas under that umbrella that I want to try out. Some of these ideas rely on having the entire compiler for my language written in the language itself.

Here is one specific example I've been vaguely thinking about. Suppose we have a compiler that translates ALFE code into an executable binary program. One problem with it is likely to be compilation speed - I'm eliminating separate compilation which is what makes incremental builds possible in C and C++. But I'd also like to have the edit-compile-test cycle be as quick as possible, even for large programs. For large programs, even C/C++ incremental builds are slow because the entire link step has to be done from scratch.

So how can ALFE, which doesn't even have separate compilation, have incremental build times even close to that of C/C++?

What I really need seems to be a daemonic compiler - one that hangs around in memory all the time and keeps all the state from the previous build so that instead of accepting a complete input file it accepts a diff between the input currently in memory and the desired input. The output could be a binary diff but for most of these edit-compile-test cycles we won't even care about the output binary - what we'll really want is a diff in the list of which testcases passed and which failed. It would be great if the compiler could figure out which testcases could possibly be affected by an input diff and only rerun those testcases. It might even be possible to make it sufficiently fast that that it could (asynchronously) recompile with each keystroke in a text editor, so you'll know instantly if your fix makes the broken test pass and if it breaks any other tests. That would be a fantastic boon to productivity - with the right set of tests, updates can be nearly instantaneous once the problem is found!

Such a system would improve reliability of software as well - if the edit-compile-test cycle is that much shorter then any manual testing would be a much greater fraction of it, so programmers would have a much greater incentive to make automated testcases for every bug fix, and test coverage would rapidly improve.

Writing a compiler is hard enough on its own, but taking a normal compiler and modifying it to transform diffs to diffs seems to be task of similarly daunting magnitude. But that's not necessarily the case if we have a compiler that is written in the language that it understands. Modifying the compiler to make it output a diff-to-diff compiler should, I think, be much easier than modifying the compiler to be a diff-to-diff compiler, since I wouldn't have to modify every part of the compiler - it would be a pass on the intermediate form that transformed transformations from f(x)->y form to f(x+dx)->y+dy form. Of course, that's still much easier said than done, but I think it takes it from the realm of the impossible to the realm of the merely very difficult.

Using one scrolling window inside another is annoying

Monday, September 17th, 2012

I'm very impressed with WordPress as a piece of software in general, and particularly at how it's managed to implement a text editor inside a web browser. However, as a text editor it's kind of annoying. It doesn't work terribly well in Internet Explorer for one thing. Even on Firefox and Chrome, it suffers from "one scrolling window inside another" syndrome which is common on many complicated "web applications". Once you've got a got more than a screenful of text in the edit box, you get an inner scrollbar to scroll it up and down. Being a keyboard-oriented sort of person, I usually use the PgUp and PgDn keys for this. PgDn works fine on the inner edit box until you get to the bottom, at which point it stops swallowing the PgDn keypresses and passes them on to the outer window, which then scrolls the top of the edit box off the window. I can't undo this with PgUp because that goes to the inner edit box! This behavior breaks the idea that PgUp and PgDn should do opposite things.

One way to fix this would be to make the edit box expand to fit the text instead of scrolling. There's probably a plugin for this (I've been able to solve just about every other WordPress problem I've had by installing a plugin) but I have no idea how to find it - the obvious searches didn't help. So most of the time I write blog posts in my preferred editor and paste them in. However, I still end up doing a pass over the text in WordPress to fix up the line breaks.

C++ features ALFE doesn't have

Sunday, September 16th, 2012

ALFE's closest ancestor is probably C++ (or possibly D, though I don't have much experience of that language). Though it also draws inspiration from such disparate corners as C#, PHP and Haskell. But there are several C++ features I'm planning on leaving out of ALFE:

  • C compatibility - this is the cause of a great many of the complexities of C++, though is quite probably also the only reason why C++ has been as successful as it has.
  • Separate compilation - another huge source of complexities in C++. The ALFE compiler will have a built in linker, so it will be able to generate an executable binary in a single instantiation. Actually I'm not entirely getting rid of separate compilation - you'll still be able to create and consume object files, but doing so will require explicitly specifying the ABI to use. The problems that are solved by separate compilation I'm planning to solve in different ways, which I'll write about in future posts.
  • Declarations. Declaring something separately from its definition is useful if you have separate compilation, or if you're writing a compiler that can compile programs too big to fit in RAM. RAM is pretty cheap nowadays so this isn't a particularly important use case - certainly not important enough to make programmers duplicate information and have an extra place to modify things when they change. The only things which look like declarations in ALFE will be for things like calling a function declared in an external binary, such as for platform ABI calls. I go to great lengths to minimize the number of declarations in my C++ programs.
  • The preprocessor. Well, macros specifically (anything you might want to use macros for in C or C++ will be done better with features of the core language in ALFE). Header files will still exist but will have rather different semantics. In particular, they won't be included until the code in them is needed - if you put an include statement inside a class definition and then no instance of that class is ever used, the file won't even be opened. That should drastically improve the time it takes to compile files while making it very simple to use the standard library (whose classes will all be included this way by default).
  • The post-increment and post-decrement operators. These do have some merit in C - you can write things like "while (*p != 0) *q++ = *p++;" much more concisely than would otherwise be possible. The first chapter of Beautiful Code has a nice example. However, I don't think these bits of code should be the ones we're making as concise as possible - such a piece of code should be written once, wrapped up as a highly optimized library function and thoroughly documented.
  • Digraphs and trigraphs. You'll need the full ASCII character set to write ALFE, which is probably not much of a hardship any more.
  • Unexpected exceptions (and run-time exception specifications). Exception specifications are a great idea as a compile time check but they make no sense as a run-time check, which is why most coding standards recommend against using them. But the reason that C++ needs exception specifications to be a run-time check instead of a compile-time check is because of separate compilation - there's no way to tell what what exceptions will actually be propagated from any particular function. With compile time exception specifications it's trivial to force the exception specification for destructors to be "nothrow" and thereby make propagating an error from a destructor be a compiler error, which is a much better solution than the "unexpected exception" mechanism in C++.
  • Multiple inheritance. I gave this one a lot of thought before deciding on it. Multiple implementation inheritance is something that seems to get used very little in proportion to the amount by which is complicates the language. I've only seen one even slightly compelling example (the BBwindow etc. classes in chapter 12 of The C++ Programming Language by Stroustoup) and the rationale for that goes away when you have control over all the parts of the program (i.e. you're not consuming code that you can't modify). ALFE actually goes one step further and disallows multiple interface implementation as well on the grounds that if you have a single class implementing more than one interface, it's a violation of the separation of concerns. From an implementation point of view, inheritance is a special case of composition - the parent object is just in the first position. That suggests a good way to simulate multiple interface inheritance - compose the class out of inner classes which implement the various interfaces - pointers to these inner classes are retrieved via method calls rather than casts. I might even add some syntax to streamline this if it gets too verbose. And I might still change my mind and introduce multiple (interface) inheritance if my plan doesn't work.
  • The main() function. Instead the statements in the file passed to the compiler are executed in order from the top - i.e. there is an implicit "main()" wrapped around the entire compilation unit. Command line arguments and environment variables are accessed via library functions (or possibly global variables), which means they don't have to be passed through the call chain from main() to whatever piece of code interprets the command line arguments. That greatly reduces the amount of boilerplate code.
  • public/private/protected/friend. Instead I'm planning to implement "accessibility sections", which are kind of like public/private/protected but specify which classes are allowed to access the members in the section. That way you don't have to give all your friends access to all your private parts, which is possibly the most innuendo-laden complaint about C++.
  • int to bool conversion. The condition in "while" and "if" statements must have boolean type. If you want C semantics you have to add "!= 0" explicitly. I've been doing this anyway in my own C/C++ code and I don't find it to be a source of excess verbosity - if anything it clarifies my intents.
  • Assignments (and increment/decrement) as expressions. This solves the "if (a=b)" problem that C and C++ have, in a far more satisfactory way than the "if (1==b)" abomination that some recommend. I've also been avoiding using assigments as expressions in my C and C++ code and I don't miss them - I think the code I write is clearer without them.
  • C strings (responsible for so many security bugs). ALFE will have a first-class String type, possibly consisting of a pointer to a reference counted buffer, an offset into that buffer and a length, but I'll probably try a few things to see what performs best, and the compiler might even decide to use different representations in different places if there's a performance advantage in doing so.
  • References. I can see why these were necessary in C++ but I think for ALFE they will be unnecessary because of the lack of a fixed ABI, and the "as if" rule (the compiler is allowed to use whatever implementation strategy it likes as long as the semantics don't change). So the rule for deciding how to pass an argument is simple: pass by value if it's strictly an input, pass by pointer if it the function might need to change the output. The compiler will decide whether to actually pass by value or pointer depending on what is fastest on the target architecture for the kind of object being passed.
  • Array to pointer decay. In ALFE, fixed-length arrays are first class types, so passing one to a function declared as (e.g.) "Int foo(Int[4] x)" passes by value, i.e. copies the entire array. Getting a pointer to the first element requires explicitly taking its address: "Int* p = &x[0];"

ALFE design philosophies

Saturday, September 15th, 2012

Every computer language has its design philosophies. Here are ALFEs:

  • Imperative rather than purely functional. This, this and this have some good points about why functional programming isn't a panacea. However, I'm hoping to implement some features which help to keep the purely functional parts of a program separate from the parts with side effects - the compiler will be able to keep track of which functions are supposed to be pure and prevent them from calling impure functions (with appropriate escapes for things like debugging and encapsulating high performance libraries behind functional interfaces).
  • Statically typed rather than dynamically typed. A type annotation is like a compiler-enforced comment and a unit test rolled into one, at an almost trivial cost. In fact, being used to static typing I now find that when I'm programming in a dynamic language I spend more time wondering what values a variable is supposed to have and tracking down errors that the type system would have caught than I do typing and modifying type annotations in static languages. Plus they're not even compulsory - I'm planning to give ALFE a Variant type which can hold any value (although most of the standard library functions will probably be statically typed so dynamic typing won't have first-class status). Some form of Hindley-Milner style type inference is also something I'd like to do.
  • Strict rather than lazy evaluation. ALFE is a high performance language and general performance characteristics are considered to be an observable behavior. Of course, lazy evaluation as a strategy for particular pieces of code is still possible - I remember when I first understood lazy evaluation reading about strategies for implementing continued fraction arithmetic in HAKMEM and thought it was tremendously clever, though unfortunately I've never had a good reason for implementing it.
  • No garbage collection. Also, the space of garbage collected languages has also been pretty thoroughly explored.
  • Rich in features at the expense of simplicity and ease of learning. In fact, if this language is so complicated that I'm the only one who can use it, that would not in itself cause it to fail to achieve what I'd like to achieve with it.
  • It's primarily a compiled language for maximum speed. Though I am currently planning on having a built in interpreter for the build system. A REPL might come in handy too.
  • The core language is not minimalist - even if some feature could be added as a library, it may still get language support if that would improve it in some way.
  • Text is for input and output, not processing.

I might come back and add some more things to this list as I think of them.

Unity language renamed to ALFE

Friday, September 14th, 2012

As regular readers of this blog know, I have a long-term on-and-off project to design my own computer language in which I can be super productive at all parts of the development cycle. I have occasionally referred to this language as Unity (given my hope that it will be useful for many purposes), but since picking that name I have learned of several other software projects by that name (including a another programming language), so I've renamed my project to ALFE. I've updated all the references I could find (including the URLs, which I didn't think I'd be able to do - WordPress has a plugin for everything). If you spot any more lurking in the dark corners of the blog please let me know.

ALFE stands for "A Language For Everything". Which has more than one meaning. One is that ALFE itself can be used for any purpose. Another is that (via the built in parser generator and code generator) you can use it to create powerful domain-specific languages.

ALFE is also the name of the standard library that is included with the language, though when it's referring to the library, it stands for "A Library For Everything".

Strings and localization in ALFE

Tuesday, October 11th, 2011

PHP is a pretty bad and ugly language in many ways, but it does have some advantages - because it's so ubiquitous for web development, a lot of effort has been put into getting it to run fast and including many useful libraries. Another advantage is its string manipulation, which is what I want to talk about today.

PHP has a nice feature whereby if you have a variable called $foo (all scalars in php start with the "$" sigil) and you want to output a string containing that value, you can just write:

echo "The current foo value is $foo units.";

In other words, you don't have to close the string to insert a variable into it. If this feature didn't exist you'd have to write:

echo "The current foo value is " . $foo . " units.";

as in C++:

cout << "The current foo value is " << foo << " units.";

which doesn't seem like much more typing, but when you're writing web pages (which are full of these inserts) it really adds up and makes the result much more pleasant to use. I was skeptical at first but after having written some PHP code I want to steal this feature for my own language. Being a statically typed language, UnityALFE also inserts .toString() calls when the inserted variable is not of String type.

Inserting more complex expressions is also possible. Though I think PHP's syntax for this is a bit confusing:

"<div class='logged_in_as'>Logged in as: {$user->link()}</div>"

Much better to have a single insertion character and allow what follows to be either a variable name or a parenthesized expression:

"<div class='logged_in_as'>Logged in as: $(user.link())</div>"

The trouble with including literal strings like this in your program, though, is that sooner or later you're going to want to translate it into other (human) languages. Current GNU C/C++ best practices for internationalizable software suggest enclosing UI strings in a gettext macro (_("foo")) which adds that string to a table and replaces the string itself with an index into that table - then localization is just a matter of replacing the table. At Microsoft we avoided putting UI strings in source code altogether, instead putting them in an .rc file and making up a macro name to refer to the string's index. That was painful, because to add a string we had to change three files - the source file (relating code to macro name), the header file (relating macro name to index) and the resource file (relating macro name to English text). I suppose there might have been economic forces resisting change to this system - Microsoft software is translated into so many languages that adding a UI string is quite expensive - you have to pay to have it translated a lot of times. Similarly with changing a UI string.

Both these methods also constrain how you need to phrase your UI strings. The canonical example is that you can't write:

printf(_("%i file%s copied"), n, n==1 ? "" : "s");

because the rules for localization vary from language to language. Translating would involve changing the code as well as the contents of the string. Thus such pieces of UI usually get phrased as:

printf(_("Files copied: %i"), n);

instead.

I want to include a localization system right in the UnityALFE compiler which solves all these problems. The high level aims:

  • It should be easy to add a string with inserts - as easy as it is in PHP. Modifying strings should be similarly easy - no editing multiple files, no _(), no making up names for the strings or finding the next available ID number.
  • The system should keep track of strings between compiles so that they can be consistently associated with their translations.
  • Translators should be able to permute, duplicate, add, remove and modify inserts as appropriate for the target language.
  • Software designers should not have to compromise their designs in the original language to make it possible to translate them into other languages - in other words there is no need to explicitly internationalize strings - it happens automatically (of course, there are non-string internationalization issues which are not in scope here, such as staying far, far away from maps.

To accomplish this, I think the best way is to have a tool (or perhaps a special mode of the compiler) which modifies the source files it processes. This is generally considered a bad thing but in this case I think it's the best solution. The one and only change it makes is to add an identifying number to the start of each string. The insert system gives a handy way to do this, so it might change:

"<div class='logged_in_as'>Logged in as: $(user.link())</div>"

to:

"$(/*12345*/)<div class='logged_in_as'>Logged in as: $(user.link())</div>"

If you're a programmer changing this string, you now have a choice - you can delete the "$(/*12345*/)" part, causing the translators to consider this a brand new string to be translated from scratch, or you can leave it alone, causing the translators to consider it a modification rather than a deletion and addition. The IDs are only ever added by this tool - no programmer should ever have to try to figure out which is the next available number by hand. It might be a good idea for the version control system to be made aware of this system so that if two strings are added with the same number on different branches and the branches are then merged, the merge system knows how to change one of the IDs automatically (everywhere that it appears). Similar empty inserts with comments can be used to leave notes for translators, such as telling them the context in which a particular string is used.

For non-UI strings, the programmer and/or translator can change the ID to a dummy so that it's clear to all concerned that the string does not need to be translated:

"$(/**/)<html>"

The interface for translators should be something like this. The tool creates one file (the translations file) for each generated binary for each language the software is translated into. This file contains one line per string in the original source, and that line contains the original string (including ID and comments), the translated string, and any notes from the translator. The compiler then reads this generated file back in to generate translation files for each language/binary combination. The translator can either edit this file directly or use some kind of tool, but either way it just gets checked back into the version control system.

The tool also checks for differences between the translations file and the source, and tells the translator which strings have been added and changed (including where the string appears in the original source, so the translator can look up the context). Because the translation file contains the inserts as source (the variable name or expression after the $) the translator is free to change this code to make sense for other languages. To handle the plural case, the string might be written as:

print("$(/*120*/)$n file$(n == 1 ? "" : "s") copied");

and the translator can then change anything inside the outermost double-quotes to make the string make sense for the target language. That might even involve calling out to external code to handle complicated cases:

"$(/*120*/)$n file$(n == 1 ? "" : "s") copied", "$n $(locale.plural(n, "plik")) kopiowane"

An important difference here is that the compiled locale file can now contain code as well as data. This may introduce testing difficulties, but it's necessary to provide this "no compromise to end user experience" goal. If necessary, site policy could dictate the software internationalization rules we usually use now, or some middle ground like ensuring all locale methods are pure functions returning in bounded time, so they can't adversely affect other parts of the software.

Templates and Kinds in ALFE

Monday, October 10th, 2011

Programmers work with values like 4 and "hydroxylic". We also generally work with variables like count and acid. Each variable holds precisely one value. But in many languages, not every variable can hold any value. In a statically typed language, count might be able to hold 4 and acid might be able to hold "hydroxylic", but not the other way around. That quality of variables and values that determines which objects can fit into which boxes is called type. So count might have a type of Integer and acid might have a type of String. Then if you have a function called square which multiplies its argument by itself and returns the result, it makes sense to say square(count) but not to say square(acid) - the latter is an error which is detected and flagged as quickly as possible (when the program is compiled, if not when it's typed in in the first place). On this blog, as a notational convention, types (and more generally type constructors, see below) start with a capital letter and variables and values don't. I think this is a useful convention, and some languages (including the language I am writing) enforce it.

A type constructor is a generalization of a type. It's either a type itself (which can be used to declare variables) or a template type constructor, a "function" of sorts which, when given another type constructor, yields a third type constructor. C++, C# and Java programmers are familiar with these "functions" - they're called templates or generics. A good example of a template type constructor is Array. You give it a type like String and you get back Array<String> (i.e. an array of strings). You can also have Array<Array<String>> if you're feeling adventurous. A type constructor can have multiple arguments, so you might have Pair<Integer, String> which (through the magic of something called currying) is the same thing as Pair<Integer><String>.

What if you have a template type constructor with a parameter that is not a type but is itself a template type constructor? Well, you can do that too - say you have a type constructor called FooCollection and you pass it Array (note, not Array<Integer> or Array<String>, which are types, just plain Array) and you get back FooCollection<Array> which might be implemented in terms of Array<Foo>. Similarly, FooCollection<List> might be implemented in terms of List<Foo>.

So if the type of the argument of square is Integer, how do we talk about the argument of FooCollection. We can't say "the type of the argument of FooCollection" because what it accepts isn't a type - a type constructor can't have a type, type is a kind of type constructor. And in fact that is exactly the word we use instead: "the kind of the argument of FooCollection is <>." Other examples of kinds are (nothing, i.e. the empty string) which is the kind of types like Integer and String (no angle brackets follow them), <><> aka <,> (which is the kind of Pair - i.e. you need to give it two types to get back a type) and <<>> (which is the kind of FooCollection - i.e. you need to give it a type constructor of kind <> to get back a type). The wikipedia article on kinds uses a different notation further removed from the C++-inspired one I use here, but the two are equivalent and the transformation from one to the other is purely mechanical.

Incidentally, you can refer to kinds in C++ as well - instead of the empty string for the kind of types, C++ uses "class" (a rather confusing overload of the word - distinct from "a structured type with default accessibility private") or "typename" which is a bit better. Instead of <>, C++ uses "template<typename>", instead of <><>, C++ uses "template<typename, typename>" (C++ has no currying) and instead of <<>> C++ uses "template<template<typename>>" (a so-called "template template parameter" - one of the more obscure areas of C++).

So we've gone from values to types to kinds, what comes next in the sequence? ("Sort?" that already has another meaning in computer science though.) Suppose you had a function which accepts a kind and returns another kind. My angle brackets here have that effect - they take a list of kinds and return a kind which is the kind of a type constructor that accepts type constructors of the listed kinds and returns a type. What a confusing sentence! Fortunately nobody to my knowledge has found a non-trivial use for anything one level up from kind. I say "fortunately" not just because it's really confusing but because we've run out of notation! If you start with a lower case letter to signify a value argument and an upper case letter to signify a type constructor argument, how do you signify a kind argument? Also, if we'd need to use different brackets (I almost wrote "a different kind of brackets" which would further confuse things - writing about this stuff is hard!) to distinguish kind functions from value functions and type constructor functions, and all the ASCII brackets already have other meanings in my language.

Anyway, this is the sort of thing that you bump your head against when you write compilers for languages which support templates. Fortunately, if you're just writing normal applications in this language you'll probably never need to be concerned with it. Until you do. At which point just doing the most obvious thing should work fine without any fuss. I think it's fun to think about though.

No exceptions

Thursday, September 29th, 2011

Recently I was out riding my bike and I saw a car sporting this bumper sticker:
God Bless Everyone - No Exceptions!
All I could think was that the owner of that car must really like to use return codes for their error handling.

Getting rid of GOTOs

Monday, September 26th, 2011

It's well known that it's possible to change a program that uses GOTOs into one that uses loops and conditionals if you're allowed to duplicate code or add more variables. One very easy way is just to change the program into a state machine, with one state for each label and a big loop around the whole thing. But under what conditions can you eliminate GOTOs if you can't do either?

The "Multiple break" example in the "More keywords" blog post is a piece of code that can't be de-GOTOized in C. I'm wondering if there are any examples of pieces of code that can't be de-GOTOized without multiple break, extra variables or code duplication. If I understand the abstract of Peterson 1973 correctly I think there aren't - that multiple break is enough to make GOTO redundant (but I can't find a non-paywalled copy of that article to check, and I don't care enough to spend $15 to find out).