Recompiling only the code that has changed

Designing an entirely new computer language is a ridiculously ambitious thing to attempt as a personal side project, but even so it apparently isn't ambitious enough on its own for me - oh no, I want to write a complete compiler for it as well.

Why would I go to all the trouble of doing that rather than just writing a GCC front end, starting with LLVM or even writing it as a translator to another language?

Well, the short answer is that I want to explore some parts of the design space of both compiler design and languages that I feel have so far gone under-explored. In other words, play around and have some fun.

I do have some specific ideas under that umbrella that I want to try out. Some of these ideas rely on having the entire compiler for my language written in the language itself.

Here is one specific example I've been vaguely thinking about. Suppose we have a compiler that translates ALFE code into an executable binary program. One problem with it is likely to be compilation speed - I'm eliminating separate compilation which is what makes incremental builds possible in C and C++. But I'd also like to have the edit-compile-test cycle be as quick as possible, even for large programs. For large programs, even C/C++ incremental builds are slow because the entire link step has to be done from scratch.

So how can ALFE, which doesn't even have separate compilation, have incremental build times even close to that of C/C++?

What I really need seems to be a daemonic compiler - one that hangs around in memory all the time and keeps all the state from the previous build so that instead of accepting a complete input file it accepts a diff between the input currently in memory and the desired input. The output could be a binary diff but for most of these edit-compile-test cycles we won't even care about the output binary - what we'll really want is a diff in the list of which testcases passed and which failed. It would be great if the compiler could figure out which testcases could possibly be affected by an input diff and only rerun those testcases. It might even be possible to make it sufficiently fast that that it could (asynchronously) recompile with each keystroke in a text editor, so you'll know instantly if your fix makes the broken test pass and if it breaks any other tests. That would be a fantastic boon to productivity - with the right set of tests, updates can be nearly instantaneous once the problem is found!

Such a system would improve reliability of software as well - if the edit-compile-test cycle is that much shorter then any manual testing would be a much greater fraction of it, so programmers would have a much greater incentive to make automated testcases for every bug fix, and test coverage would rapidly improve.

Writing a compiler is hard enough on its own, but taking a normal compiler and modifying it to transform diffs to diffs seems to be task of similarly daunting magnitude. But that's not necessarily the case if we have a compiler that is written in the language that it understands. Modifying the compiler to make it output a diff-to-diff compiler should, I think, be much easier than modifying the compiler to be a diff-to-diff compiler, since I wouldn't have to modify every part of the compiler - it would be a pass on the intermediate form that transformed transformations from f(x)->y form to f(x+dx)->y+dy form. Of course, that's still much easier said than done, but I think it takes it from the realm of the impossible to the realm of the merely very difficult.

6 Responses to “Recompiling only the code that has changed”

  1. […] an incremental compiler sufficiently well integrated with the debugger, interpretation is unnecessary - the debugger can […]

  2. Jim Leonard says:

    I'm reminded of the first time I saw this in action, using QuickBasic 4.0 in 1987. It was a p-code (https://en.wikipedia.org/wiki/P-code_machine) compiler, and changing only one line would quickly recompile just that one line. (This had a secondary advantage of syntax-checking just that one line as you moved the cursor off of it, which gave you immediate feedback.) When you ran the program, it performed a quick "binding" step (their term for linking, I'm assuming) which was a lot faster than compiling from dead zero.

Leave a Reply