Bootstrapping a compiler from nothing

Two posts today 'cos I missed yesterday due to being disorganized.

Recently I've been working on bootstrapping a compiler from nothing. Just for fun. I know it's been done before but I wanted to learn about parsing and optimizing and how compilers are constructed.

The first stage of my compiler is a pretty clever hack, even if I do say so myself. I didn't want to use any external tools to get my compiler started, but that left me with a problem - how do I generate the first executable file? Well, one way to generate an arbitrary file from Windows is to just use an "echo" statement in the Windows command prompt and redirect the output to a file. But that only works reliably for ASCII characters (and not even all of those). This poses a problem, because the opcodes for even simple "MOV" instructions are all non-ASCII characters. But it turns out that the "constrained machine code" for x86 consisting of only ASCII bytes is actually Turing-complete and can be used to do useful things (non-ASCII opcodes such as the one for "INT" can be constructed using self-modifying code). So I put together an ASCII program that takes two characters from the command line, combines them into a byte and outputs the resulting byte (which can then be redirected to a file). Calls to this program can be strung together to make (almost) arbitrary binary files, which can be used to compile more complex languages.

In this way (13 iterations later) I have built up a simple but effective 2-pass 16-bit DOS assembler which outputs .com files. I have also written a recursive descent parser for simple infix expressions on unsigned 16-bit integers, and am working on writing a code generator which can output binary code for these expressions.

Eventually I hope to evolve this into a fast and powerful language to rival C++. It will be a language which combines very low-level and very high-level concepts, and will therefore be an ideal language for writing compilers (such as itself) in. I could then use it for writing all sorts of other fun things - maybe I'll tackle an OS when I've finished the language. But for now I'm just having fun learning about things.

7 Responses to “Bootstrapping a compiler from nothing”

  1. [...] language I eventually want to write will have a parser generator (probably generating packrat parsers from Parsing Expression Grammars) [...]

  2. [...] – a compiled language I’m playing about with. This name is a bit overloaded, I might call it ALFE (A Language For [...]

  3. Shinkarom says:

    "But it turns out that the "constrained machine code" for x86 consisting of only ASCII bytes is actually Turing-complete and can be used to do useful things (non-ASCII opcodes such as the one for "INT" can be constructed using self-modifying code). So I put together an ASCII program that takes two characters from the command line, combines them into a byte and outputs the resulting byte (which can then be redirected to a file). Calls to this program can be strung together to make (almost) arbitrary binary files, which can be used to compile more complex languages."
    Do you have any links to that? I want to learn more.

    • Andrew says:

      I've added my little Bootstrap compiler to github. You can find the first stage here: https://github.com/reenigne/reenigne/blob/master/Bootstrap/build1.bat . Unfortunately 64-bit Windows can't run .com files natively so a 32-bit version is needed to do the bootstrapping.

      • Shinkarom says:

        Fortunately, I have a 32-bit computer.
        I was asking for the links to some blog posts or articles that also write about this constrained ASCII machine code.
        If I get PCem to recognize floppy drives, I will also try to bootstrap something in MS-DOS.

        • Andrew says:

          I'm afraid that this post and the comments in the code is all I've written about it. Other people have done things with ASCII-constrained x86, although if I recall correctly they used a big loop rather than self-modifying code to get to Turing completeness.

Leave a Reply