It would be really interesting to try to make a waveform that is fractal in nature, and actually sounds interesting and musical at whatever frequency you play it at. As you speed it up, some frequencies will go so high they become inaudible and new frequencies move into the audible range at the low end. In normal music, the pitches, effects, tempos and structures all happen at different timescales but in fractal music they would all interact with each other in interesting ways.
Fractal waveform
September 27th, 2011Getting rid of GOTOs
September 26th, 2011It's well known that it's possible to change a program that uses GOTOs into one that uses loops and conditionals if you're allowed to duplicate code or add more variables. One very easy way is just to change the program into a state machine, with one state for each label and a big loop around the whole thing. But under what conditions can you eliminate GOTOs if you can't do either?
The "Multiple break" example in the "More keywords" blog post is a piece of code that can't be de-GOTOized in C. I'm wondering if there are any examples of pieces of code that can't be de-GOTOized without multiple break, extra variables or code duplication. If I understand the abstract of Peterson 1973 correctly I think there aren't - that multiple break is enough to make GOTO redundant (but I can't find a non-paywalled copy of that article to check, and I don't care enough to spend $15 to find out).
Using templates when you don't really need to
September 25th, 2011Templated C++ classes are a bit nicer to use than non-templated classes, because one can define the entire class within its declaration without having to make sure that the types it uses are defined first (this check is done at instantiation time for template classes, but at parse time for non-template classes). I have found myself making classes templates when they don't really need to be just so that I don't have to define member functions outside the declaration - i.e. when CTemplate
Is Haskell too abstract?
September 24th, 2011The Haskell programming language implements a lot of very cool ideas (many of which I want to liberate for my programming language). However, Haskell seems to have a reputation for being a very difficult language to learn. The IO monad seems to be one particular sticking point, but this didn't seem to be particularly difficult to me - it's just a clever little hack (I just read lots of "How I came to understand the IO monad" accounts until I got it).
Another sticking point seems to be there's a lot of stuff written about Haskell in the form of academic papers (with all the academic jargon that entails). That shouldn't be a surprise (since it's a language with origins in academia which seems to be only reluctantly escaping to industry), but it is kind of interesting that there's a sort of language barrier between academia and industry - what the former calls "deforestation hylomorphism" might be called "tree flattening" by the latter, for example.
But I think the real problem is something shared by other functional languages - it's just a different level of abstraction than we're used to thinking about. In imperative programming languages programmers can think "okay, what do I want the machine to do here?" and write that code. In functional programming languages one instead has to think about what the inputs and outputs are, write functions to perform those transformations and trust that the compiler will optimize it well (to a first approximation). It's much more like doing mathematics than imperative programming. Compilers will do all sorts of crazy things to turn those pure functions into high-performance imperative code. So when things are inevitably too slow, it's not obvious why (since the relationship between the generated code and the source code is very complicated) and it's difficult to understand how to make it faster.
Functional programming languages do have some interesting advantages over imperative languages, though - it's much easier to reason about pure functions than about imperative programs with all their attendant state, which means lots more interesting automatic optimizations are possible and (increasingly importantly) the code can be automatically parallelized so it takes maximum advantage of modern multi-core CPUs.
I'm not sure which side to place my money on. I suspect both imperative and functional paradigms will continue to exist for the foreseeable future. On the imperative side: the low-level OS and runtime components must can't be written using functional languages, and optimizations and multi-core abstractions for imperative languages will continue to improve. On the functional side, compilers will require less and less hand-holding to generate good code and will generate better code than imperative compilers for more and more situations.
Why GOTOs are bad
September 23rd, 2011It seems to me that a lot of the complaints about GOTOs turning programs into spaghetti are really about jumping into a block - jumping out of blocks is much easier to reason about, and in fact with the minor modification to the break keyword I mentioned before it's trivial to turn any such GOTO into structured code without even rearranging anything.
I do have some C++ code (written for my Mandelbrot plotter program) which uses "goto". I originally worked out the algorithm as a state machine, so writing it in that way was natural. I tried rewriting it to use loops and conditionals instead but it just obscured what the program was really trying to do. I could have used a switch statement inside a loop as well but I think that's really just a more verbose way of saying the same thing.
When is your program compiled?
September 22nd, 2011When we're all using languages which can compile code at runtime it's going to be more important to think about when your code is being compiled. Take a fractal plotter for example - one in which users can supply a formula to be iterated. Obviously we'd like to do some compilation here instead of just interpreting the formula, as the latter would be very slow. But how often should we recompile it? The more often we recompile, the more information we have available to us and the more optimizations we will be able to use. For example, if we just compile when the formula changes, we would have to use arithmetic operators which can work with any precision, but if we recompile whenever the precision changes we can unroll those arithmetic operators and make the code much faster (especially for low precisions). On the other hand, recompiling does have some overhead so we probably wouldn't want to recompile for each pixel. Though for some formulae that might actually be helpful - if we can hoist a test from per-iteration loop to the per-pixel loop and the iteration count is high it might be worth it.
One possibility might be to give the code-generation library the freedom to compile whenever it likes, so it can try various things and run with what works best.
A weird thing in Haskell
September 21st, 2011Here's something odd I noticed while playing around with the Haskell programming language. Sometimes, a==b does not imply f(a)==f(b). Look:
> 1/0 Infinity > 1/(-0) -Infinity > 0==(-0) True > (1/0)==(1/(-0)) False
Language in which assignment is reflective
September 20th, 2011Sometimes I like to think about the most fundamental assumptions I can about how things work and wonder "what if this were not true"? Here is an example of that sort of crazy thinking.
In most programming languages the statement "a=b" assigns the value of "b" to the variable named "a" - the two things "a" and "b" play completely different roles (although after the statement has executed they both have the same value). What if assignment instead were reflective, so that "a=b" meant the same thing as "b=a"?
In that case, the information about whether this statement says something about "a" or something about "b" must come from elsewhere. For example, if earlier we had said "a=1" then both "a=b" and "b=a" would mean "b=1". So each variable would have some information associated with it (let's call it the "direction") that says whether it accepts values from other variables or passes its value to other variables when assigned.
If the direction of variables can change over the course of a program's run, it won't generally be possible to determine statically that two output variables are never assigned to each other, so that would have to become a run-time error. Unless we have the ability to change the direction of two connected variables at the same time, we'd have to allow two input variables to be assigned to each other (and give them both some default value).
Perhaps surprisingly, you can end up with a Turing-complete language by going down this route - all you need are two other elements: one is a way of connecting three variables together (obviously no two of them could be outputs at the same time) and a NAND or NOR operator with two inputs and an output. Then you can build any function like a logic circuit.
What's the use of this? Well, it's handy for describing electronic circuits (where assignment means connection). Also, when I write my modular emulator it will have a language like this built in for describing how things are connected up. For that the "two outputs connected together" isn't just a theoretical problem any more - the Commodore PET computer had a Killer poke which connected two outputs together, damaging circuitry. An emulator could have some fun with how it handled this - I'm thinking an "explosion" graphical effect might be a nice easter egg.
Interactive IFS
September 19th, 2011I want to write a program that allows you to explore Iterated Function System (IFS) fractals interactively by moving points around with the mouse.
There's a few different ways to do this, depending on what set of transformations you use.
IFS fractals usually use affine transformations, which encompass translations, rotations, dilations and shears. This can be done with 3 points per transformation - if one considers the points as complex numbers we have the transformation . However, rather than controlling
,
and
directly I think it would work better to move the images of some interesting points like
,
and
(i.e. move
,
and
). Then the geometric interpretation of the control points would be easy to understand - they would just be the corners of a transformed rectangle.
However, there are other possible transformations. We could reduce the number of control points to 2 and disallow non-orthogonal transformations, giving and control points
and
mapping to
and
.
We could move to quadratics and move
,
and
, and with 4 points we can do cubics
(in which case we would probably use control points
,
,
and
).
We can even go all the way to (6 control points) if we wanted to go really crazy - that might be a bit unwieldy though.
I'd also like to be able to associate a colour with each transformation so that the colour of a point in the final image depends on the sequence of transformations that led to that point. Perhaps for some
.
Patches
September 18th, 2011One of the reasons the state of desktop computer security is in such a mess is that the average desktop machine has a lot of different pieces of software with attackable surface area but they all have different update mechanisms, some more reliable than others, some more painful than others. On Windows machines, the Microsoft patches are pretty good - they come out the second Tuesday of every month (occasionally sooner if a flaw is being actively exploited). Everything updates at once through a single mechanism. Usually there's a reboot required, but rebooting once a month isn't too bad. The most annoying gripe is the massive amount of CPU time spent to re-NGEN everything when there's a .NET update.
Then there's the Adobe patches which seem to get discovered whenever the machine is rebooted, and then themselves require a subsequent reboot (grr). Is that just Reader or does Flash update through the same mechanism? I'm not even sure. Firefox updates itself almost every time I start it if I haven't used it for a while, requiring me to have to start it again. I know the Google applications update themselves since I see googleupdater.exe in my task manager but they're very quiet about it - I don't even know when they update.
It would be far better if everything was updated through a centralized mechanism like my Ubuntu box does. Unfortunately, that update mechanism is very intrusive - it seems like every other day that it finds some new updates and installs them (though it rarely requires a reboot, which is nice). Also unfortunately, I suspect that it only works with software installed through the built-in software installation mechanism - programs I installed manually like the flash plugin and XMMS won't be updated.
To ensure everybody's desktop machines are secure, here is how I think things should work:
- There should be a centralized mechanism for notifying client machines of new versions of software. Actually there should be several - if I don't trust X's server to be up-to-date, I should be able to use Y's instead.
- The server I'm using should keep track of which pieces of software I have on my machines and their version numbers. The update checking process should be trivial in the "no updates available" case: my client machine just asks the server "any new updates for me" and the server goes "nope". Client machines should also know what they have installed so that if I change servers, the client just uploads the list of software to the server.
- Some servers might only track a subset of software - that's okay, I should be able to specify multiple servers.
- Software updates come in two sorts - new versions and security bug fixes. New versions I want to be notified about (but don't necessarily want to install immediately). Security bug fixes I want installed as soon as possible. The defaults should reflect this (though it should be possible to opt out of automatic installation of security bug fixes - some administrators want to be able to check out new fixes before installing them to make sure they won't break anything.
- When a security bug fix is downloaded, it should be installed automatically, the application (or even system) restarted if necessary, and everything put back in its previous state - i.e. I should never lose anything due to a restart. This is the really tricky bit.
Application updates are the easiest types of restart - developers generally know if the application they are writing has security surface area, so (when it does) they can tell it to listen for messages from the updater application and, when an update is available, persist all its state, close down and restore all that state on the next startup.
The next most difficult thing is an update in a shared component. Some of the applications which use this component might be update-aware, and can be restarted just as if they were being updated themselves. Others might not be - consider the case of a compiler, which is designed to do what it is asked and then to exit - it's not supposed to be running for an extended period of time. These can generally be left alone to pick up the update on their next restart, or trusted not to care if a component they are using is insecure. This means that multiple versions of a shared component might be in use at any given time - the system should be able to cope with this. Of course, if these shared components are separate processes this ceases to be a problem.
Inevitably (even with the ability to patch running kernels) there will be times when a system reboot is necessary. Perhaps a driver puts a piece of hardware into an insecure state and it physically can't be put back into the secure state without a hardware reset. So the OS needs to be able to freeze a process, serialize it to disk and then restore it to the exact same state after a reboot. This means that all applications (even those with no security surface area) need to be written to be able to cope with the clock suddenly jumping forward, or hardware devices they are using suddenly being reset. In modern pre-emptive multitasking systems this is usually the case anyway.
On some operating systems, a reboot implies a modification of the address space layout of processes - in particular, Vista's Address Space Layout Randomization (ASLR) rebases system DLLs on system startup so that they are not in predictable places. I think it doesn't particularly affect security if a "forced update" causes the system DLLs to keep the same address across that particular reboot - as long as the address space layout isn't predictable across a large fraction of the machines running that software, the defense mechanism retains its effectiveness.