Archive for the ‘language’ Category

Integers as types

Friday, February 19th, 2016

In C++ there are two kinds of things you can use as template arguments. One is the obvious one, types, but less well known is that integers (and some other values) can also be used. Early on in the design of ALFE's type system I decided to simplify things by having template arguments all be type constructors. If you really need to use an integer (or indeed any other value) as a template argument you can wrap it in a type.

However, I recently thought of another feature I'd like ALFE fixed-length arrays to have - the ability to be indexed by a type other than Integer. For example, we might want to have an array indexed by a Boolean:

Boolean baz;
...
Foo bar = barData[baz];

Now we could just convert the Boolean to an Integer when indexing the array:

Foo bar = barData[baz ? 1 : 0];

But that seems ugly especially if we're indexing barData in a lot of places. We could turn it into a helper method but then it's less obvious to callers that it's an array access. Also this is rather more unwieldy if our array is indexed by an Enumeration type.

What is the name of the type of an array that takes Boolean and returns Foo? By analogy with the function type Foo(Boolean) we would expect it to be Foo[Boolean] which is quite nice (though makes the sequence type [Boolean] even more irregular - maybe that should be renamed as Boolean[]?)

So if Foo[Boolean] is a array of Foo indexed by Boolean then Foo[2] should be an array of Foo indexed by a type called "2", which must then be an integral type with values 0 and 1. So perhaps integers are types after all. Integer literal type names can't be used everywhere that normal type names can occur, though - it would lead to too many parsing ambiguities. So we'd need another name for the type "2", perhaps "LessThan<Class{Int n=2;}>", "()[2].Indexer" or "Zero.Successor.Successor".

Of course, if you want that you'll almost certainly also want ranges. We can kill both those birds (and many others) while still keeping the same meaning for Foo[2] by creating difference types: "A-B" is a type whose values are those that are in A but not in B. So the type "3-1" would allow the values 1 and 2. Unfortunately 3-1 is not the same as 2 (though the two types do have the same cardinality).

Alternatively, the idea that the type "2" should be a type with a single value (the value 2) also has some beauty especially when combined with sum types so that you could create a type that can contain some arbitrary subset of integers and have that checked at compile time.

Hardware support for ravioli

Wednesday, February 17th, 2016

Ravioli memory is something that could benefit significantly from hardware support. Page operations and inter-thread communication are some specific things that could be accelerated, but there are others as well:

  • Dereferencing through reverse-ravioli trees (and forward-ravioli trees to find parent objects).
  • Allocating and deallocating/moving. The latter requires fixing up any pointers than happen to be in registers, so the set of registers that hold pointers needs to be found somehow (either by using tag bits or having a subset of registers dedicated to pointers).
  • Prefetching the memory for the ravioli pointed to by registers, the ravioli pointed to by those ravioli and so on. Since there are a relatively small number of memory locations that could be accessed at any moment in time. This is in contrast to current CPU architectures, where pretty much any address could be generated at any point in time.

Registers and other ravioli are two places where pointers can be found and which need to be fixed up. Are there any others? What about the stack? Well, searching the entire stack for pointers to the raviolo that's getting moved is an O(n) operation, which we want to avoid. So we can't be storing pointers on the stack.

One thing about stacks as implemented in modern OSes - you have to decide how big they can get in advance. With virtual memory systems you don't have to commit storage for the entire stack, but you do have set some kind of limit on it so that heaps, dynamically-loaded libraries and stacks belonging to other threads can go below them (assuming a downward growing stack). If you choose a limit that is too small you'll get a stack overflow, and choosing a limit that is too large is wasting address space.

Both of these problems could be solved by not having a separate stack at all, and storing activation frames in the ravioli. This eliminates the distinction between stack and heap data, and allows for trivial implementation of Cactus Stacks.

What other chunks of memory can we shoehorn into ravioli? What about the actual code that the CPU runs? Code loading and unloading less often causes fragmentation than heap allocation/deallocation, but it could happen (and might be more likely with dynamic code generation). Putting code into ravioli has some other nice effects as well - it means that ravioli addresses can be embedded directly into code, and those addresses can be updated automatically when those ravioli move around - it's just like any other reference from one raviolo to another. Unless your code raviolo ends with an instruction like "jump to the address in this register" or "return from subroutine" or "halt the CPU", it will need a successor raviolo to jump to when it reaches the end. So all such code ravioli must end with a pointer to the next code raviolo to run, in the tradition of the Royal McBee RPC-4000. It could also end with two addresses for a conditional jump instruction. Some instructions will need a third address as data (e.g. load an immediate address into an address register). Code ravioli are sort of like basic blocks except for all being the same size. They may contain several actual CPU instructions.

Along with code come the vtable pointers previously mentioned. These are a bit like code (in that they are normally generated when the program is compiled) but are not executable. My thinking was that each raviolo would have a pointer's worth of metadata which tells both the CPU and the program what it's looking at. However, the vtables (if not stored as ravioli) are another potential source of fragmentation. Worse, if every single "just data" raviolo has to dedicate 1/4 of its space to a vtable pointer it's quite wasteful. Getting rid of the vtable pointer takes the limiting-case overhead for straight arrays down from 300% to 167% (binary trees), 100% (ternary trees) or 31% (7-child trees with ravioli that are 8 pointers long). Vtable pointers can still be implemented by compilers in the normal way (as a normal pointer field in a structure).

The CPU will still need some metadata about each raviolo to know where the pointers are and whether it's a reference tree raviolo that should be skipped for dereferencing. This is only a few bits worth, however - I've counted 12 different types of ravioli that are the size of 4 pointers:

  • 1 incoming pointer
  • 1 incoming pointer, 1 outgoing pointer
  • 2 incoming pointers, 1 outgoing pointer
  • 1 incoming pointer, 2 outgoing pointers
  • 2 incoming pointers
  • 3 incoming pointers
  • 4 incoming pointers
  • 3 incoming pointers, 1 outgoing pointer
  • 2 incoming pointers, 2 outgoing pointers
  • 1 incoming pointer, 3 outgoing pointers
  • 1 incoming pointer, 3 outgoing pointers - reference only (reverse tree)
  • 3 incoming pointers, 1 outgoing pointer - reference only (forward tree)

Suppose our pointers are 64 bits, so our ravioli are 256 bits and aligned to 32 byte boundaries. Then we know that the low 5 bits of our incoming pointer will all be zero, so we can reuse them for these type bits. The low 5 bits of outgoing pointers might still be useful for pointing to particular bytes inside ravioli (especially for string handling). Of course, for finding matching pointers for the purposes of moving ravioli, these low bits would be ignored.

There we have it - everything raviolized. Except for the CPU registers - there's no possibility of fragmentation there. In fact, one could think of the register file as one extra large and special raviolo that is unique in not having an incoming pointer (it's location is implicit) and whose outgoing pointers don't need to be reference-enumerated (since they're automatically updated when the ravioli they point to move anyway).

Another advantage that a machine implementing ravioli in hardware might have is the possibility of isolating threads from each other even if they share an address space. The CPU knows where all the pointers are and can control access to the places where they are so that only a valid pointer can be placed in a pointer slot. Thus a thread can't get access to data unless it already has a pointer to that data somewhere. In particular, a thread can't access another thread's data. So we can use the same address space for all processes, and we no longer need to distinguish between threads in the same process and threads in another process. The OS keeps a mapping from (non-pointer) thread ID numbers to the initial page of each thread, so if a thread needs to refer to another thread it can do so via this thread ID.

Threading and ravioli

Tuesday, February 16th, 2016

One thing that's been bothering me about ravioli memory is the possibility that it doesn't interact very well with threading. Trying to have multiple threads manipulating a single ravioli heap is a non-starter: whenever one thread needs to allocate or deallocate a raviolo (a very common operation) all other threads would need to be suspended and (in the deallocation case) have any pointers in their registers potentially adjusted.

So rather than having a single heap for all threads, we should instead have a heap per thread. But one thread may not refer to another thread's heap! If it did, then again we've have to synchronize those threads whenever a deallocation occurred.

So essentially when using ravioli, every thread must have a separate address space (logically, even if multiple threads share the same address space in the traditional sense). But that kind of defeats the point of having separate threads in the first place! The whole point of threads is that they share an address space, so that multiple threads can party on the same data without explicitly transferring it from one thread to another.

Threads are the standard way to build interactive applications that remain responsive when the doing some background processing (or network access, disk access, etc.). They've also become increasingly important as the exponential rise in single-core performance has given way to a rise in the number of hardware threads per CPU. They have low overhead (switching threads can be very quick since the address space doesn't need to change) and are reasonably easy for operating systems to implement.

But that doesn't necessarily mean they're the best way to structure applications to take best advantage of a CPU's resources. There is a serious hidden penalty to threads that applies even when using the most sophisticated lock-free techniques and latest multi-core CPUs. That penalty is this: sharing memory between cores is not free. If a piece of memory is written to from one core and read from another, the caches that make these memory accesses fast must be flushed. So memory accesses become a great deal slower when there is contention between cores for the cache lines in question.

So, to get best performance out of multi-threaded code, the threads should work as much as possible with memory that is private to them and share as little memory between threads as possible, in other words they should avoid being "chatty".

With ravioli, communication between threads essentially amounts to message passing. One thread passes a message consisting of some data structure to another thread. To accomplish that, the ravioli comprising that data structure are moved into some shared-memory buffer, ownership of that buffer is transferred to the other thread and then the marshalling ravioli are moved on to the other thread's heap. That sounds like a lot of extra work but the extra transfers each happen within a single thread and seem likely to be quick compared to all the cache-flushing involved in the actual moving of information from one core to another (though I should qualify that I haven't actually measured it). In fact, the localization of this movement (in space and time) may even have some performance benefits over the ad-hoc chattiness of most multi-threaded programs.

How can multiple threads share a single address space in the traditional sense? One possible way would be to divide up the address space into pages of (say) 4KB (we might want to experiment with different values to see what gives the best tradeoff of performance to per-thread memory overhead, but there are likely to be advantages in choosing a size which corresponds to the natural page size of the machine's MMU). Whenever a thread needs to allocate a new raviolo, it uses space from the current page. If there is no more space left on the current page, it requests a new page (which involves coordination with the other threads). If a thread has two entire pages that are completely empty, one of them is returned to the inter-thread pool of empty pages (which again requires coordination). This hysteresis prevents excess inter-thread coordination when a thread's memory usage is close to a whole number of pages. A thread's pages are linked together using the first raviolo from each page (which is reserved for that purpose), as are the pages in the free list. So a thread's pages do not have to be contiguous in the address space and fragmentation is not a problem.

These pages also form a natural mechanism for the "shared memory buffers" used for inter-thread communication, alluded to above. It's pretty straightforward to move a page from one thread to another (just remove it from one thread's list and place it on another's). The one tricky part is making sure that only the ravioli that actually need to be moved to another thread are placed in the page to be moved. So a thread will need a way of allocating a page separate from it's normal heap-top. This page doesn't get its ravioli compacted and isn't automatically freed if there is too much unused space. A shared memory buffer is almost like the address space of a different thread altogether (except that there isn't a corresponding thread of CPU execution). Once the buffer is placed on the other thread, a compacting operation is needed to restore the "no gaps" invariant - ravioli from the end of the new buffer are moved into any previously free space, after which the previously shared buffer is just normal memory.

It occurred to me that these special buffers may also be useful for serializing to or deserializing from disk, network, or the address spaces of other processes. In these cases the high bits of the pointers (the ones corresponding to the page address) need to be fixed up to point to the real pages on deserialization. But perhaps this method of serialization would be overly wasteful - for all but the largest regions of data, most of those pointer bits are useless to the deserializer. So perhaps it would make more sense to compress the data before sending it to remove this redundant information.

Modifications to a programming language for ravioli

Monday, February 15th, 2016

I'd like to do some experiments with the ravioli memory I've been writing about in the past few blog posts. To give it a fair test I'd need a decent body of code (like a compiler) written in a language that I have control over. This is another reason I want to make my own compiler.

Some language modifications would be necessary to make good use of ravioli. While ravioli could be used with any non-garbage-collected language, to minimize the overhead it would be a good idea to add some language features which ensure that there is no duplication between the "low level" information used by the ravioli themselves and the "high level" information stored in the ravioli. For example, you would not want to implement reference counting on top of ravioli since the ravioli themselves have something even better (reference enumeration). Also you'd want to make sure that the vtable pointers used by the ravioli to determine where the pointers are in a given raviolo (and, if reference trees are used, whether it is normal or reverse) are also used for the high-level atoms or vtables.

I'm not sure quite what modifications to ALFE could be made to help with ravioli. I suspect most of it is just re-implementing low-level data structures, but I guess I'll find out what else can be done when I implement it.

Reference trees for ravioli

Sunday, February 14th, 2016

One thing that always bothered me about ravioli memory is that deleting a node can take O(n) time because the raviolo that was moved into the new gap might have a large number of pointers pointing to it. While the behavior will probably be O(1) amortized, it's quite possible to have a pathological case when deleting a large chunk of memory, leading to O(n2) behavior. Though that could probably be mitigated by not moving any ravioli until we're done deleting (i.e. having separate deallocation and move steps).

Or we could turn the move operation from O(n) to O(1) by limiting the number of pointers which can point at any raviolo. We can do this by having two types of ravioli:

  1. The normal type, which is pointed to by exactly one pointer. Each normal raviolo has a "parent" entry which points to that pointer.
  2. The reverse type, which can only point to one other raviolo (the "target") but which has two ravioli pointing to it. Each reverse raviolo has two "parent" entries which point to those pointers.

So, if you dereference a pointer expecting to find a normal raviolo but instead find a reverse raviolo, you need to (repeatedly) follow the "target" pointer of the reverse raviolo until you find a normal one. This may have to happen O(log(n)) times, so there is an additional cost to dereferencing (although array and object dereferencing is already O(log(n)) with the previous ravioli.) In return, moving (and hence deleting) becomes O(1) as there are at most 2 pointers to update. So this type of ravioli might be better for applications where the data is changed a lot, whereas the earlier type might be better for applications where the data mostly doesn't change but is queried a lot.

With this scheme, each raviolo only needs 4 pointer-sized entries: one for the vtable, one for the parent pointer (or target pointer for a reverse raviolo) and two for normal pointers (parent pointers for a reverse raviolo). Though (possibly depending on the type of data involved) it might be worth using 8-pointer ravioli to make the trees shallower and reduce overhead. 8-pointer reference tree ravioli could also interoperate with linked-list ravioli.

For an array of simple data, 4-pointer ravioli has an overhead of 300%. However, such arrays are growable/shrinkable for no extra cost, and you can even insert and delete elements in the middle without copying the entire array. 8-pointer ravioli has an overhead of 60%.

Modifying a pointer also involves O(log(n)) operations, as it may involve rebalancing the reference tree (the tree of reverse ravioli pointing to the old and new targets of the pointer). It could actually take much longer than that if you're deleting the last reference to a large data structure, since you'd have to delete all those ravioli. That's true for all reference counting systems, though, and there's no scope for pathological behavior since any given raviolo can only be deleted once.

The set of ravioli in a program at any given point in time form a graph. It would be interesting to visualize these graphs for a complicated program and see what patterns they make and how they change with time. This would also be a useful debugging tool - especially if you can delve into the actual values in each raviolo.

2-forward, 1-back list for ravioli

Saturday, February 13th, 2016

It turns out that having three "low-level" pointers per "high-level" pointer is not actually necessary in ravioli memory - two is enough.

The reason for this is that all the entries in the linked list are actually the same, so we don't need to store the pointer value in every element of the list. What we can instead do is to have two types of node:

  1. A node containing the pointer to the target and a pointer to the next node in the linked list.
  2. A node containing a pointer to the next node in the linked list and a pointer that points back in the linked list either 2 or 3 entries.

To find the target pointer if you have a node that doesn't contain it, just advance through the linked list until you do. To add a pointer to the linked list, add a forward pointer if that would cause the next backward pointer to point back 3 entries, otherwise add new backwards pointer so there are two "back 2" entries and the invariants are maintained. To remove a pointer, do the opposite to adding it.

If we want to make each raviolo a power of 2 bytes in size, we can then have 3 "high-level" pointers per raviolo (it doesn't allow us to make the ravioli smaller, since we still need a pointer for the vtable and to the first entry in that node's reference list).

However, I've since moved on to a more conceptually clean version of ravioli, which I'll talk about tomorrow.

The MuT music tool

Thursday, February 11th, 2016

For 8088 MPH I wrote a tool to convert Amiga MOD (module) files to the format required for playback with the 4.77MHz 8088 PC speaker 4 channel playback routine. The MOD file solution never felt quite ideal to me because the playback routine has some possibilities (like SID-style ring modulation) which can't be expressed in a MOD file and there are also a lot of things you can do in a MOD that won't really work with my player, so if I make it easy to try arbitrary MODs with the player, people are likely to try MODs that don't come out very well and conclude that the player is rubbish.

What I really wanted was to write my own tracker designed specifically for the player routines I had written (and some variants that I might want to write). But writing a whole tracker is a big project - particularly the GUI (GUIs take ages and aren't the most interesting things to program).

So I started thinking: what's the simplest piece of software I could write that a musician could use to compose music for this player? What about a command line tool - a sort of compiler which accepts text files as input and generates binary music data in the appropriate format (or indeed various formats) as output? This isn't entirely unprecedented - there have been various tools for processing text files into music such as Grigasoft's "Polyphonic Music" and John Worley's "Clockwork Pianola".

This is the design I came up with - I'm calling it "MuT" ("music tool", pronounced "mute") for now - it can't seem to decide if it's a tracker, a musical instrument or a programming language.

Inside the text files we would probably want to have something similar to a more traditional tracker's pattern grid, with different channels arranged horizontally and time going vertically. Rather than using semantically-significant spaces and newlines (which cause all sorts of trouble) I think a nice way to do it would be for the musician to lay out the grid using the "&" character to separate voices (think "C & E" means a C and an E playing at the same time) and the "|" character to indicate a new time division (think "bar is a measure of time" though the "|" interval would usually be shorter than a musical bar, obviously). So an empty grid would look something like:

output =
     &     &     &     |
     &     &     &     |
     &     &     &     |
     &     &     &     ;

The spaces could then be filled in with notes:

output = sine@(
 C4 & E4 & G4 & C5  |
 C4 & E4 & A4 & C5  |
 C4 & F4 & A4 & C5  |
 D4 & F4 & A4 & D5  |
 D4 & F4 & B4 & D5  |
 D4 & G4 & B4 & D5  |
 E4 & G4 & B4 & E5  |
 E4 & G4 & C5 & E5  );

Leaving a space blank causes the note in the same voice in the previous division to continue, so this could also be written:

output = sine@(
 C4 & E4 & G4 & C5  |
    &    & A4 &     |
    & F4 &    &     |
 D4 &    &    & D5  |
    &    & B4 &     |
    & G4 &    &     |
 E4 &    &    & E5  |
    &    & C5 &     );

If you want to silence a voice, put a 0 in there instead of a blank:

output = sine@(
 C4 & E4 & G4 & C5  |
 0  & 0  & A4 & 0   |
    & F4 & 0  &     |
 D4 & 0  &    & D5  |
 0  &    & B4 & 0   |
    & G4 & 0  &     |
 E4 & 0  &    & E5  |
 0  &    & C5 & 0   );

One can also put different instruments into the grid:

output =
 sine@C4 & square@E4 & triangle@G4 & bell@C5  |
 sine@C4 & square@E4 & triangle@A4 & bell@C5  |
 sine@C4 & square@F4 & triangle@A4 & bell@C5  |
 sine@D4 & square@F4 & triangle@A4 & bell@D5  |
 sine@D4 & square@F4 & triangle@B4 & bell@D5  |
 sine@D4 & square@G4 & triangle@B4 & bell@D5  |
 sine@E4 & square@G4 & triangle@B4 & bell@E5  |
 sine@E4 & square@G4 & triangle@C5 & bell@E5  ;

Instrument names are resolved per voice, and "." is (by convention) a sort of default instrument name, so this can also be written:

output =
 { . = sine; } .@C4 &
      { . = square; } .@E4 &
           { . = triangle; } .@G4 &
                      { . = bell; } .@C5  |
               .@C4 & .@E4 & .@A4 & .@C5  |
               .@C4 & .@F4 & .@A4 & .@C5  |
               .@D4 & .@F4 & .@A4 & .@D5  |
               .@D4 & .@F4 & .@B4 & .@D5  |
               .@D4 & .@G4 & .@B4 & .@D5  |
               .@E4 & .@G4 & .@B4 & .@E5  |
               .@E4 & .@G4 & .@C5 & .@E5  ;

One can make instruments in all sorts of ways. A few simple (chiptune-esque) waveforms are built into the program, or you can load them from a file:

bell = load("BELL.WAV")@s/440;

The MuT syntax is extremely flexible and rich enough to do just about any kind of audio processing, and hopefully it's also intuitive enough that it's not too difficult to figure out how to do just that.

What follows is a more technical and in-depth explanation, so if you're not a programmer or mathematician you might want to stop reading here.

Basically most MuT objects are functions from the real numbers to the complex numbers. Both the range and domain of these functions have units (some integer power of seconds - s) which are tracked by MuT so that it can distinguish between time-domain functions like sine@C4 and pure functions like sine.

There are various operators for acting on MuT objects:

+ - pointwise addition: output = sine@C4 + square@E4; sounds the same as output = sine@C4 & square@E4; but it has some different behavior in other ways - you can't do the "empty space means continue playing the same instrument in that voice" trick if you use + instead of & in your grid, and you can't use a + b as an l-value.

- - pointwise subtraction: same as + but the function on the right has its phase inverted.

* - pointwise multiplication (aka modulation). Also useful for volume envelopes.

/ and ^ - pointwise division and power respectively. Mostly for completeness sake, though they do have some uses especially for scalars.

[] indexes into a function, just like an array in C or Pascal. So sine[1] = 0.841.... When you put a function instead of a scalar inside the brackets, you (naturally) get function composition. So a[b][x] = a[b[x]].

{} allows execution of statements during the evaluation of an expression. This is handy for redefining instruments while inside the grid (see example above), or changing the tempo, amongst other things. The tempo (i.e. the amount of time covered by one "|" is set by the special variable division. So if MuT sees { division /= 2; } inside an expression, the following grid rows will be played at twice the speed (only durations are affected, not frequencies). The scope of any changes inside {} is the remainder of the statement in which it occurs.

@ - basis scale. This is where it gets really interesting. This is an operator only applicable to functions, not scalars (so unlike +, -, *, / and ^ it isn't found on calculators). Suppose you have a function f and a scalar x. Then (f@k)[x] = f[k*x]. So if f is a waveform then f@2 is the same waveform played twice as fast (and an octave higher). The @ operator also adjusts domain units if the right-hand side has a value which is not dimensionless. So, if f has a dimensionless domain (e.g. sine) then f@110*Hz (or f@110/s) will be a normal time-indexed waveform (i.e. a sine wave with a frequency of 110Hz). As I'm sure you can see this is a very useful operator for an audio program!

It gets better, though. Suppose we have a complex unit i = sqrt(-1); and we make f@i compute the Fourier transform of f. Surprisingly, I discovered (after defining it that way) that the mathematics of doing so work out very neatly - time scaling and Fourier transformations are closely related, mathematically - they are both Linear Canonical Transformations, and there's a nice mapping from complex numbers to LCTs which gives an (efficiently computable) meaning to f@z for any complex number z. Using f@i in combination with * allows us to do convolutions and therefore any linear filters you care to describe (high-pass, low-pass, band-pass, notch, echos, resonances, you name it). Using other complex numbers on the right-hand side of @ gives us inverse Fourier transforms and fractional Fourier transforms (the musical utility of which I know not, but I'm sure inventive musicians will find some interesting uses for it).

One can also use a function on the right-hand side of @, which will result in a different basis scaling for each sample in the output - i.e. for scalar x and functions f and w, (f@w)[x] = (f@(w[x]))[x]. That's how the first non-trivial example above works.

& - next voice operator. This just puts moves to the next voice, as we've seen above. It can also be used on the left-hand side of an assignment: (a & b) = (c & d); does the same thing as a = c; b = d;. This is useful for grouping values together into a compound object.

| - time sequence operator. c = a | b yields a for division followed by b for division. Functions in MuT repeat once they are complete, so if you evaluate c in a time sequence with a sufficiently long division, it'll sound like a | b | a | b | .... Similarly you can loop a division-long section of just one function by doing c = a | ;.

As with &, | can be used on the left-hand side of an assignment: (a | b) = c; assigns the part of c between 0 and division to a and the part of c from division to division*2 to b. So by using just assignment, the special division variable and the | operator you can make whatever edits to a waveform you like.

The comparison operators ==, !=, <, <=, >, >= work the same way as their C equivalents, yielding (in general) a function whose values are boolean (true, false) elements rather than numbers. These can be used with the ?: operator to combine two functions via a predicate.

The % operator has a couple of uses (which I might change to use different characters). On its own it gives the output from the previous voice in a voice set, which is useful for ring-modulation style effects (such as those that can be done by the SID chip and by my MOD player routine). If it comes immediately after a sequence, it yields a function which is the time-to-index mapping of that sequence. So (a|b|c)% is a function that is 0 from 0 to division, 1 from division to division*2 and 2 from division*2 to division*3. You can also change the time-to-index mapping of an l-value by assigning to this function. If the time-to-index mapping function takes a non-integral value at any point then the corresponding two elements of the sequence are mixed, so you can create fades and glissandos. Time-scaling the time-to-index mapping function will speed up or slow down the playing of that sequence without affecting the frequencies.

If you combine (via multiplication or basis-scaling) two functions which don't have the same domain units you end up with a compound object. When this compound object is basis-scaled, it'll move the dimensions of the compound object closer and leave other elements unchanged. So you can create an instrument by combining a waveform (dimensionless domain) and a volume envelope (time domain), and when this is time-scaled the scaling will change the frequency of the waveform part without speeding up or slowing down the volume envelope.

I'm contemplating a built-in function that will convert a waveform into a sequence so that it can be time-scaled without changing the pitch (or vice-versa) in the manner of a phase vocoder, but I haven't quite got all the details ironed out yet.

Arbitrary functions can be defined: e.g. major(x) = x + x@5/4 + x@3/2; will turn any waveform or function into a major chord using its input as the base note. If we allow functions to be recursive, then (by the Church-Turing thesis) MuT becomes a Turing-complete programming language (for better or for worse).

It's handy to be able to have libraries of predefined functions, instruments, values (note frequencies, intervals) etc. Executing an include statement will read in a file and insert its contents into the program at that point (much like C's #include directive). Conventionally, a MuT input file will start with include "standard.mut"; to include the predefined variables which come with the program, or include "pc_speaker.mut"; which does the same but also sets up special variables for the PC speaker output mode.

There's some more design documents (which are somewhat out of date but cover a few more dark corners) and the beginnings of an implementation on my github. Being a command-line tool, it's not terribly user-friendly but it probably wouldn't be hard for someone more GUI-oriented than me to slap a user interface on top of it. One fairly simple user-interface for this might just be a special text editor, which plays whatever object is under the cursor whenever a particular key combination is pressed (and which highlights the parts of the MuT program which are being used to compute the currently-playing notes). An interesting enhancement would be a little knob that appears whenever the mouse hovers over a numeric value; dragging this knob changes that numeric value in real-time, changing the currently playing sound if that value is involved in it.

A tale of two debuggers

Tuesday, February 9th, 2016

Every non-trivial compiler to have an intermediate representation - that is, a form of code that is neither the input source code nor the output binary or assembly code. Apart from the fact that neither of these are good representations on which to do many optimizations, it's also good for a compiler to be able to accept multiple different input languages and target multiple different output architectures.

A particularly important part of writing a compiler is figuring out a good intermediate representation. And one particularly important part of doing that is deciding on how to store debugging information. I'm not planning to write an ALFE debugger immanently but I expect that one would be useful eventually so I want to make sure that the infrastructure is sufficient to do so when I do.

I think that some desirable properties to have in a debugger are consistency, predictability and flexibility even in the face of highly optimized code - ideally it shouldn't even be necessary to have separate "debug" and "release" configurations - you'd debug on the same build of your program that ended up on the user's machine. Though one might still make a separate build with some extra checking, logging or other debugging infrastructure in order to investigate a particular bug.

There's two ways to approach writing a consistent, predictable debugger. One is to approach it from the machine code side: the places the debugger can stop are the spaces between machine instructions (let's call these red locations), and when it's stopped you can examine and modify register contents and memory locations. In other words, an old-style low-level debugger like DEBUG in MS-DOS, that understands nothing about the language that the program was written in.

The other way is to work from the source level. and is kind of a holy grail of debuggers. The places the debugger can stop are the spaces between statements or expressions with side effects (let's call these blue locations). Any source-level expression can be evaluated, any statement can be executed and any lvalue modified. What's more, the order in which things happen is predictable.

To understand what this means, consider the following ALFE program:

Int d = 0;
Int e = 0;
Int a() { ++d; return 2; }
Int b() { ++e; return 3; }
Int c = a() + b();

For the object code that results from the compilation of the last line, the compiler is entitled to generate the code that evaluates a() and b() in either order (since the order doesn't make any difference to the output of the program) or even both at once if that's possible in the target architecture's instruction set. However, when debugging we would like the order to be predictable - i.e. we would want the debugger to act as if a() is evaluated before b(). That's obviously a trivial example, but in real code optimizations can cause debugging to be unpredictable in far more severe ways.

One way this could be done is to ignore the binary program completely and interpret the source program (or, equivalently, rebuild the program with no optimizations and debug that). That would make the program run much slower under the debugger than when running normally (which may make it too slow to be practical - I've written programs where trying to debug using the non-optimized version was just too slow, which made debugging very tricky).

So, most debuggers don't try to be that predictable - they implement some hybrid approach. For example, executing things in machine code order rather than trying to simulate anything, and storing in the debug information an instruction address for each source code line (so that, unless explicitly stepping through instructions rather then the source code) you can only stop on instructions corresponding to source code lines.

I think a better approach (from the point of view of predictability) might be for the debugger to run the normal, optimized binary code until a breakpoint is hit, at which point it switches to interpretation, reconstructing all the information it needs about the program state from the stopped program's registers and memory. That might mean that on hitting a breakpoint in b() you might see d unincremented in the above example, but stepping through the same line of code would give a different result. I'm not sure if that causes more confusion than it solves.

Seeing e incremented before d should have no observable consequences (outside of the debugger) to the execution of the program - otherwise the compiler would be incorrect in swapping the calls. Checking that the compiler and debugger are correct is a specific non-goal of the debugger - we really have to work on the assumption that they are correct, otherwise we'd be second-guessing ourselves.

With an incremental compiler sufficiently well integrated with the debugger, interpretation is unnecessary - the debugger can invoke the compiler to add the breakpoint as a function call and recompile. The breakpoint then becomes an observable of the program, so that only the specific instances of the specific optimizations that need to be are disabled. Care needs to be taken to ensure that the format of in-memory data structures doesn't change in order that the program doesn't have to be restarted, but that shouldn't be too difficult.

Anyway, what does this mean for debug information? We need to somehow relate items in the object code to items in the source code and vice-versa. What does this mean in practice?

For one thing, we'll need to find a relationship between red locations and blue locations. One way to represent this would be a linked list of locations which may be either red or blue or purple (for the case where a red location perfectly coincides with a blue one). Given a location and type, you can obtain the next red location, the next blue location, the next purple location or the next location of any colour. This isn't sufficient for all optimization transformations (for example in a nested loop swapping the inner loop with the outer one so that the former becomes outer and the latter becomes inner) but it's a start.

The other thing we need is enough information to reconstruct the value of any variable given the register values and memory contents (and vice versa, for going back to the "running" state after interpreting for a while). The naive way of doing that would be to have a big matrix with red locations horizontally and variables vertically, and in each cell an algorithm for computing that variable and another for modifying the machine state when that variable is modified. Obviously that would take far too much memory for all but the most trivial programs, but it might be possible to store it in a highly compressed form, such as just storing changes to this table for each instruction.

Going back to the intermediate form, I think all this means is that I want to be able to store two things:

  • Some "pseudo-instructions" (which don't do anything to the output object code itself, but mark the position of a blue location, and have pointers to the previous and next blue locations, which may not be the same as the previous and next blue locations in the object code code). Some of these pseudo-instructions may also coincide with an actual instruction (when the blue location is between two consecutive red locations).
  • For each instruction, a list of changes to the "how to compute a high level variable" table.

Variadic templates in ALFE

Wednesday, October 24th, 2012

Some of the types in ALFE are templates which can accept different numbers of parameters - in other words, variadic templates. How does this fit in with the Kind system I wrote about before?

As well as the Kind constructors "<>" (template) and "" (type), we need a third Kind constructor which represents zero or more Kinds in a sequence suitable for template arguments. Let's call this Kind constructor "...". As with C++ templates, we should allow the members of the sequence to have arbitrary kind. So <...> is the kind of a template that takes zero or more arguments whose kinds are type, while <<>...> is the kind of a template that takes zero or more arguments whose kinds are <> (a single argument template whose argument is of kind type). The kind of Function is <, ...> (the comma is required because Function has one or more arguments rather than zero or more). More complicated kinds like <<...>, ...> are also possible.

To actually create a variadic template, we need to implement "recursive case" and "base case" templates and have the compiler choose between them by pattern matching, just as in C++. So the definition of Tuple might look something like:

Tuple<@T, ...> = Structure { T first; Tuple<...> rest; };
Tuple<> = Structure { };

Is that enough? Well, I believe this gives ALFE's kind system parity with C++'s. However, there is one more possible Kind constructor that I can imagine - the Kind constructor which can represent any Kind at all - let's call it $. A template with an argument of this kind (i.e. has kind <$>) can be instantiated with an argument of any kind. So you could have:

Foo<@T> = { ... };
Foo<@T<>> = { ... };
Foo<@T<@R<$>>> = { ... };

The first template is used when Foo is instantiated with a type, the second is used when it's instantiated with a template of kind <> and the third is used when it's instantiated with any other single-argument template, no matter what the kind of the argument is. The third template could then pass R as an argument to another instantiation of Foo and thereby "peel apart" the kind of the argument (as long as none of the kinds involved have multiple arguments).

By combining $ with ... I think we could potentially peel apart completely arbitrary kinds. However, I'm not sure if this is worth implementing since I can't think of any use for such a thing. Still, it's good to have some idea about how to proceed if I do eventually come across such a use!

Edited 31st January 2016: Since writing the above, I have realized that the concept of the $ Kind specifier has a fatal flaw. Just because our Foo template has been instantiated with a template of a particular kind, does not mean that we have any way of obtaining type constructors of suitable kinds for instantiating that template. We can't pass R as an argument to another instantiation of Foo because we have no way to find such a thing to pass - all we have is a template that we know takes such a thing. There might be no suitable type constructors in the entire program! So there is really nothing we can do with our T here at all - we can't instantiate it (even partially) since we don't know the kind of its first argument. We could pass it to another template, but that just moves the problem around without making any progress on it.

Classes vs structures in ALFE

Tuesday, October 23rd, 2012

A lot of the time in my C++ code I find myself writing class hierarchies. However, because of the way that inheritance works in C++, you need to use a pointer to point to an object which is of a particular type or some subtype of that type. In my code these pointers tend to be reference counted smart pointers to alleviate thinking about lifetime issues but the principle is the same.

Then, because I don't want to have to keep typing "Reference foo" everywhere, I encapsulate these smart pointers into value classes which provide functions that call the (virtual) functions in the implementation class. So I end up with code like this:

class Foo
{
public:
    Foo() : _implementation(new Implementation) { }
    void frob() { _implementation->frob(); }
protected:
    Foo(Implementation* implementation) : _implementation(implementation) { }
    class Implementation : public ReferenceCounted
    {
    public:
        virtual void frob() { ... }
    };
private:
    Reference<Implementation> _implementation;
};
 
class Bar : public Foo
{
public:
    Bar() : Foo(new Implementation) { }
protected:
    class Implementation : public Foo::Implementation
    {
    public:
        virtual void frob() { ... }
    private:
        ...
    };
};

That's really more boilerplate than I want to write for every class. I'd really much rather write code that looks more like C# or Java:

class Foo
{
    public Foo() { }
    public virtual void frob() { ... }
};
 
class Bar
{
    public Bar() { }
    public override void frob() { ... }
    ...
};

So I'm thinking that in ALFE I should have some syntax that looks close to the latter and behaves like the former. I'm leaning towards making "Struct" behave like C++ struct and class (i.e. a value type), and "Class" behave like a C# class (reference semantics). So:

Foo = Structure : Reference<Implementation> {
    Foo() : base(new Implementation) { }
    Void frob() { referent()->frob(); }
access<Foo>:
    Foo(Implementation* implementation) : base(implementation) { }
    Implementation = Structure : ReferenceCounted {
        virtual Void frob() { ... }
    };
};
 
Bar = Structure : Foo {
    Bar() : Foo(new Implementation) { }
access<Bar>:
    Implementation = Structure : Foo.Implementation {
        virtual Void frob() { ... }
    access<>:
        ...
    };
};

is the same as:

Foo = Class {
    Foo() { }
    virtual Void frob() { ... }
};
 
Bar = Class : Foo {
    Bar() { }
    virtual Void frob() { ... }
access<>:
    ...
};

There is a comparison to be made here between POD (Plain Old Data) and non-POD classes in C++ (though it's not quite the same because the Implementation classes would be non-POD in C++).

Obviously there's a lot of details here which would need to be fleshed out but I think something along these lines would be useful.