First I wondered how we could sense the relative position of each bar in space. I fairly quickly decided that putting a tiny microcontroller in each of the bars would probably be the best way. The root box queries the bar it's connected to, which then sends back its length, orientation (i.e. which of the 4 connectors it is being queried through) whether its switch is being touched or not. This bar then performs the same query on the bars connected to its three other connectors, sends this information back to the root and so on, walking the tree. A program to transmit a few bits of data, walk one node of a ternary tree and then pass back another string of bits sounds very simple, but as we'll see there are a lot of complications.
The first microcontroller I looked at was the cheapest one I could find, the PIC10F200. This thing is seriously cheap and seriously tiny (both physically and software-wise). It has 16 *bytes* of memory (same as a single SSE register), runs at 4MHz and can run programs that are up to 256 instructions long. It also costs just 30 cents in large quantities. I thought it would be an fun programming challenge to fit such a simple program into such a tiny space.
Unfortunately, I failed in this endeavour because the PIC10F200 only has 4 IO pins (6 pins altogether - the other two are power and ground). Only 3 of them can be used for output. I needed at least 5 pins - one for each directional connector and one for the switch. I tried to figure out some way of addressing each direction with a unique pair of IO pins but couldn't figure out a scheme that would actually work.
So then I decided to go for the next PIC up, the PIC12F508. This has 6 IO pins (perfect), 512 instructions, 25 bytes of RAM and costs $0.41 in large quantities. I bought a pile of them. The PIC16F54 is even cheaper ($0.39), has 12 IO pins and runs up to 20MHz but has no internal oscillator.
I originally figured that I would drive all the PICs with a single external clock to make it easy to keep them in sync. However, I discovered a major problem with this approach - these microcontrollers don't have a external clock oscillator mode. You can use the internal oscillator, an external crystal or an external RC timebase. It might be possible to use an external clock in EXTRC or XT mode but it's out of spec, might damage the microcontroller, might be unstable, might degrade the clock signal and might cause one of the other IO pins to be unavailable. Also, even with cycle exact code there is complexity in the timing because each instruction takes 4 or 8 clock cycles, and you can't control which of the 4 phases you get.
So I decided to use the internal oscillator. It's factory calibrated to +/- 1% so I should be able to synchronize two PICs just by adding appropriate delay loops, making sure that signal pulses are long enough to cover all the possible times when they might be read, and that the reader waits until the pulse is certain to have started before attempting the read. Programming in cycle-exact assembler is difficult enough when there's only one CPU to worry about, but when you're writing code that's going to run in sync on two CPUs and all timing is done by cycle counting and the CPUs have slightly different clock rates it's a nightmare!
The first version of my program took too much space (everything was unrolled and I had different code paths for each orientation). I got it to fit by moving some of the code into subroutines (you have to choose which code you put in subroutines carefully, since the stack only has two levels) but didn't have much space for delay loops.
So I rewrote it to avoid storing the "which port is the parent" and "which port is the child" information in the program counter. Now it only checks this information when reading from or writing to a pin, or checking all the pins to see which direction a signal will come from first. Takes less than half of the available space - brilliant.
The next problem is that only two microcontrollers can be communicating at once (one talking, one listening). If three are trying to communicate at once, eventually the middle one is going to face a time when it has to talk to both of the other two within a certain amount of time, and won't be able to satisfy the conflicting demands. So when a microcontroller is getting a bit of data from a child, it has to tell the parent to wait and we can't have a "bucket brigade" of bits. This means that we get quadratically slower as we get further from the root. Counting the cycles, I realized that (even before all the delay loops had been added) we were outside of our cycle budget to get a reasonably small latency.
To get the "bucket brigade" back, I realized that we had to have a way to synchronize all the microcontrollers at once. We can't use an external cycle clock for this but what about a much slower "heartbeat" signal shared between all the microcontrollers, coming in on the spare pin?
The idea is this: each microcontroller has a state variable. On each heartbeat, each microcontroller jumps to a subroutine corresponding to its state. This subroutine reads the pins set in the previous cycle, sets output pins for the next cycle and updates the state variable for the next cycle before going back to waiting for the next heartbeat. All the waiting is done at once, and we no longer have the precise timing difficulties we have before - as long as we have rules like "how late you can read the inputs", "how early you can set the outputs" and "how long the heartbeat can take" everything should work out just right. I think we can get much better performance with this system.
The musical toy is really about melody and harmony, but under the covers it's rhythm that makes it work.