Archive for October, 2012

Multiplying faster with squares

Tuesday, October 30th, 2012

The 8088 has a hardware multiplier, but it's quite slow:

MUL with byte arguments: 69 cycles plus 1 for each set bit in AL plus 1 if the high byte of the result is 0
MUL with word arguments: 123 cycles plus 1 for each set bit in AX plus 1 if the high word of the result is 0

Signed multiplies are even slower (taking at least 80 cycles for a byte, 134 cycles for a word), and depend on the number of set bits in the absolute value of the accumulator, the signs of the operands and whether or not the explicit operand is -0x80 (-0x8000 for word multiplies). I also measured some word IMULs apparently taking a half-integer number of cycles to run, suggesting that there's either some very weird behavior going on with the 8088's multiplier or that there's a bug in my timing program (possibly both).

Can we beat the 8088's hardware multiplier with a software routine? There's a clever trick which can sometimes be used to speed up multiplications, based on the following:

(a+b)2 = a2 + b2 + 2ab
(a-b)2 = a2 + b2 - 2ab

Subtracting these gives:
4ab = (a+b)2 - (a-b)2

Or:
ab = (a+b)2/4 - (a-b)2/4

So if we keep in memory a table of x2/4 we can do a multiply with an add, two subtractions and two table lookups.

Does this still work if the entries in the table are fractional? Yes, it does, and here's why. x2/4 is an integer for even x, and for odd x it will always be an integer plus a quarter, as a quick induction will show. (a+b) and (a-b) are always both odd or both even (since they differ by 2b which is always even) so the quarters always cancel and we can just ignore them.

In 8088 assembly code, this can be written like this (assuming the operands are in ax and cx and that we don't care about the high word of the result):

  xchg ax,bx
  mov si,offset table
  shl bx,1
  shl cx,1
  add bx,cx
  mov ax,[si+bx]
  sub bx,cx
  sub bx,cx
  sub ax,[si+bx]

That works out to about 88 cycles (93 with DRAM refresh) - already quite an improvement over the hardware multiplier. We can do even better, though - if we place our table of squares at DS:0 and assume that our operands are in si and bx we get:

  shl bx,1
  shl si,1
  mov ax,[si+bx]
  neg bx
  sub ax,[si+bx]

Which is just 56 cycles (59 with DRAM refresh) - more than twice as fast as the hardware multiplier!

For byte-sized multiplies we can do something similar in just 34 cycles (36 with DRAM refresh):

  mov al,[si+bx]
  neg bx
  sub al,[si+bx]

However, it's less important for byte-sized multiplies since we can fit an entire 8-bit multiply table in memory and do:

  xchg bh,bl
  mov al,[si+bx]

Which is only about 20 cycles (21 with DRAM refresh). There's obviously a bit of a memory-usage vs time tradeoff there though.

There are some more nice things about these algorithms:

  1. There's more choices about which registers to use for the operands and results.
  2. If you're doing a lot of multiplies and want the results of them all scaled by a constant factor, you can build this into the table.
  3. If you're summing the results of several multiplies (e.g. computing a dot product) you can do it right in the accumulator (just change the "mov" to an "add".

The downsides (apart from the memory usage and time to initialize the table) are that you don't get the high byte/word for free and that it only works for signed multiplies.

How to get away with disabling DRAM refresh

Monday, October 29th, 2012

On the original IBM PC 5150 (and it's mostly electrically-equivalent derivatives, the 5155 and 5160) the operation of the bus (the data channel between the CPU and it's memory and peripherals) is interrupted is interrupted 2,187,500 times every 33 seconds (a rate of about 66KHz) for 11/13,125,000 of a second each time (i.e. 4 out of every 72 CPU cycles). During that time, very little of the machine can operate - no RAM can be read or written and no peripherals can be accessed (the CPU might be able to continue doing it's thing if it's in the middle of a long calculation, and the peripherals will continue to operate - it's just that nothing can communicate with each other).

Why does this happen? Well, most computers (including the one this blog post is about) use DRAM (Dynamic RAM) chips for their main memory, as it's fast and much cheaper than the slightly faster SRAM (Static RAM) chips. That's because each DRAM bit consists of a single capacitor and transistor as opposed to the 4 or more transistors that make up a bit of SRAM. That capacitor saves a lot of hardware but it has a big disadvantage - it discharges with time. So DRAM cells have to be "refreshed" periodically (every 2ms for the 16 kbit 4116 DRAMs in the original 5150) to maintain their contents. Reading a bit of DRAM involves recharging the capacitor if it's discharged, which refreshes it.

But a computer system won't generally read every bit of RAM in any given interval. In fact, if it's sitting in a tight idle loop it might very well not access most of the memory for many minutes at a time. But we would be justified in complaining if our computers forgot everything in their memories whenever they were left idle! So all general-purpose computers using DRAM will have some kind of circuitry for automatically accessing each memory location periodically to refresh it. In the 5150, this is done by having channel 1 of the 8253 Programmable Interval Timer (PIT) hooked up to the Direct Memory Access (DMA) controller's channel 0. The BIOS ROM programs the PIT for the 66KHz frequency mentioned above, and programs the DMA controller to read a byte each time it's triggered on channel 0. The address it reads counts up from 0 to 65,535 for each access, then goes back down to 0 again.

If the DRAM needs to be refreshed every 2ms why does the refresh circuit run at 66KHz and not 500Hz, or for that matter 8.192MHz? To answer that question, we need to know something about how the memory is organized. The original 5150 had banks of 8 chips (plus a 9th for parity checking). Each chip is 16 kbit, so a bank is 16KBytes. If you had a full 640KB of RAM organized this way, that would be 40 banks or 360 separate chips! (By the time that much memory become common, we were mostly using 64 kbit chips though.) Within each chip, the 16 kbits are organized in a grid of 128 "rows" and 128 "columns". To read a bit, you input the "row" address, then the "column" address, then read back the result (hence the chips could have just 16 pins each, as each address pin corresponds to both a "row" bit and a "column" bit). Happily, whenever a row is accessed, all the DRAM cells on that row are refreshed no matter what column address is ultimately accessed. Also, the low 7 bits of the physical byte address correspond to rows and the next 7 bits correspond to columns (the remaining 6 bits correspond to the bank address). So actually you could quite happily get away with just refreshing addresses 0-127 instead of addresses 0-65,535 on this machine (though there was a good reason for doing so, as we'll see later).

To ensure that they meet tolerances, electronic components (including DRAM chips) are manufactured with certain margins of error, which means that often one could get away with reprogramming the PIT to reduce the DRAM refresh rate a bit in order to squeeze a little bit more performance out of these old machines - it was a common hack to try, and I remember trying it on the family computer (an Amstrad PC1512) after reading a little bit about DRAM refresh in a computer magazine. I seem to recall that I got it up from the standard 1/18 to maybe 1/19 or 1/20 before it became unstable, but the performance improvement was too small to notice, so the little .COM file I made with DEBUG never made it as far as my AUTOEXEC.BAT.

For many of the timing experiments and tight loops I've been playing with on my XT, I've been disabling DRAM refresh altogether. This squeezes out a bit more performance which is nice but more importantly it makes the timings much more consistent (which is essential for anything involving lockstep). However, whenever I've told people about this the reaction is "doesn't that make the machine crash?" The answer is "no, it doesn't - if you're careful". If you turn off the refresh circuitry altogether you have to be sure that the program you're running accesses each DRAM row itself, which happens automatically if you're scanning through consecutive areas of RAM at a rate of more than 66KB/s, or for that matter if you've done enough loop unrolling that your inner loop covers more than 127 consecutive bytes of code. Since these old machines don't have caches as such, unrolled loops are almost always faster than rolled up ones anyway, so that's not such a great hardship.

Not all of the machines I'm tinkering with use 4116 DRAM chips. Later (64KB-256KB) 5150 motherboards and XTs use 4164 (64 kbit) chips, and modified machines (and possibly also clones) use 41256 (256 kbit) chips. The principles are exactly the same except these denser chips are arranged as 256x256 and 512x512 bits respectively, which means that there are 8 or 9 row bits, which means that instead of accessing 128 consecutive bytes every 2ms you have to access 256 consecutive bytes every 4ms or 512 consecutive bytes every 8ms respectively (the PIT and DMA settings were kept the same for maximum compatibility - fortunately the higher density DRAMs decay more slowly so this is possible). So when disabling DRAM refresh, one should be sure to access 512 consecutive bytes every 8ms since this will work for all 3 DRAM types.

The cycle-exact emulator I'm writing will be able to keep track of how long it's been since each DRAM row has been refreshed and will emit a warning if a row is unrefreshed for too long and decays. That will catch DRAM refresh problems that are missed due to the margins of error in real hardware, and also problems affecting only 41256 chips (my machine uses 4164s).

Modern PCs still use DRAM, and still have refresh cycles, though the overhead has gone down by an order of magnitude and the exact mechanisms have changed a few times over the years.

What I want from an online store

Sunday, October 28th, 2012

Amazon's great - I buy all sorts of stuff from there. But their searching interface still seems to be optimized for the books, CDs and DVDs that they started out with. It works fine if you know what you're looking for and just need the search results to decide if you want the hardback or paperback edition. But if you need a waste paper basket that fits into a particular space, a digital alarm clock with a red LED display or a shaver light/socket with an 8" spacing between the mounting holes (all things I've tried to find there) - you're limited to sorting a single department's search results by price or score and then going to each of the thousands of results one by one to see if any have sufficiently detailed information on the product page (most don't).

What I really want is an interface like Mouser's - also known as a parametric search, whereby you can narrow down by (multiple values of) all kinds of different parameters and then sort the results by various parameters as well (Mouser's great by the way - I've bought lots of bits from there and it's got the best combination of prices, interface and selection of all the places I've tried). I suspect that the reasons that this isn't done by Amazon already are:

  1. Electronic component manufacturers often make ranges of components with many parameters that you can choose independently (for example, these resistors have 7 independent parameters mentioned in the datasheet).
  2. Electronic component manufacturers (by necessity) provide much more detailed information about their product than manufacturers of consumer goods (which hitherto have mostly just had to look nice on a store shelf).
  3. Amazon sells a lot more different products than Mouser (I make it 3,489,158 for Mouser and 166,367,811 for Amazon at the time of writing).

This seems like it ought to be surmountable, if enough people cared. It would be great to see an open, crowdsourced, wiki-like worldwide database of SKUs with as much technical information as possible about each product that Amazon, eBay and anyone else who wanted to could plug in to. It wouldn't need to be perfect to be useful (Mouser's isn't) - in fact it's usefulness probably increases roughly linearly with the number of products in it.

A new protected class: things you've said

Saturday, October 27th, 2012

Some years ago, when this site was much smaller than it was today, it was suggested to me that I might want to be careful about what I wrote here lest it get read by a prospective employer who might find a reason in it to decline me employment. Having my website be nothing more than a online resume would be very boring, though, so I declined - in rather more polite terms than I really felt. Besides which, any employer who would do such a thing would clearly not be a good employer to work for. I'm lucky in that I have a pretty desirable skillset, though - not everyone is so fortunate.

I bring this up now because of this horrifying story that I read this morning. The very suggestion that such a thing might be done will have a massively chilling effect on participation in publicly archived discussions. Blogging is already hard enough knowing that everything I say is really part of my permanent record without imagining that it will be data-mined to discover all sorts of things about me that I didn't want to share in the first place!

We talk a lot about free speech in the western world, and take very seriously any possibility that government might limit that speech. But I think we don't take seriously enough threats to our free speech from public sector. Knowing that we can't get arrested for stuff we say online isn't terribly useful if that same stuff can make us unemployable.

So, I'd like to see some kind of legal framework that would prevent employers from discriminating against prospective hires based on things they've said. Such a framework wouldn't be completely unprecedented - there are already several pieces of information that are technically available to employers which they can't use in employment decisions. I propose that we just expand that to make "stuff you've said" a protected class. Naturally, that would also make it illegal to fire someone over something that they said (though exceptions would probably have to be made for things directly related to their job - it should still be possible to fire someone for violating an NDA, for example).

Companies don't like to have employees who say terrible things on the internet, because it reflects badly on them (and their hiring practices). But it only does so because they have the power to do something about employees who say terrible things on the internet. If they didn't have that power, they can just say "it's not work related - it's nothing to do with us". Essentially, because it's not prohibited it's essentially compulsory. So companies ought to be clamouring for this legislation - it would ensure they could concentrate on their core business and not have to go googling for dirt on their employees. It would also mean that they could choose the best person for the job without having to take into account stuff that fundamentally doesn't matter to them. And it would make it less likely that they would be left short-handed due to an ill-advised comment.

Practical considerations of a consumption tax

Friday, October 26th, 2012

I enjoy listening to NPR's Planet Money podcasts and I have learned a lot about economics from them.

One recent show that was particularly interesting and that has had me thinking is the one about the no brainer economic platform. That has made me reconsider several of my own economic opinions and throw several sacred cows on the barbecue (to mix up some metaphors).

In particular, I had always thought that income taxes were the best kind of taxes, being progressive and cheap to collect. However, as Planet Money points out - taxing something discourages that behavior and we don't want to discourage income and employment! A consumption tax makes much more sense in terms of incentives - discouraging consumption is environmentally friendly and, while it isn't naturally the most progressive form of taxation, can be tweaked to make it reasonably progressive. How to make it practical to collect is a different matter, though. Taxing something is always extremely invasive, as the government will require lots of information and documentation about that thing in order to make sure that you're not cheating on your taxes.

A carbon emission tax seems like one of the best kinds of consumption tax. We're pretty universally agreed that putting carbon dioxide in the atmosphere is an undesirable thing. Most of that net carbon comes from fossil fuels which tend to come from a few big mining operations, so it's natural and easy to tax these operations per tonne of carbon they sell. They will then pass these costs on to their customers and ultimately to the consumer, discouraging the consumption by means of increased fuel costs (of course, there are political problems with raising fuel prices but that's a bit out of the scope of this discussion - after all, Planet Money does point out that their economic platform would be political suicide!)

Fossil fuels aren't the only way carbon gets into the atmosphere, though. Should we also charge dairy farmers for the methane emissions from cows? That has a big effect on climate change too, although the carbon involved came out of the atmosphere much more recently than that from fossil fuels. The fairer we try to be, the more invasive the tax becomes. For example, consider someone who lives completely off the grid, growing trees and felling them for firewood. He's completely self-sufficient in terms of carbon cycles (while burning the trees puts out carbon dioxide, it's completely cancelled by growing them in the first place). So it doesn't seem like he should be paying any carbon tax. So we'd have to offset the emission tax against carbon absorption - in other words pay a credit to individuals and organizations causing a net reduction in the amount of carbon in the atmosphere. Growing trees is one such activity. What about landfill site operators, who sequester enormous amounts of carbon in the ground? That's not generally considered to be a hugely "green" activity, but by this measure it would be.

Of course, as we inevitably move away from fossil fuels, that form of tax becomes increasingly useless. We can tax other forms of "digging up stuff from the ground and using it up" consumption, but I suspect that getting all the tax money we need that way would have some really undesirable consequences (like pricing a lot of useful things like electronics right out of the range of what most people can afford).

What other things are consumed that we could tax? Well, there's a lot of energy falling on the earth in terms of sunlight that we can collect and reuse in various forms (and indeed I expect that's where much of our energy will come from in a few decades). But it's hard to think of that as "consumption" when it's so plentiful and so much of it just goes to waste. Harnessing solar energy is also something we don't really have any interest in discouraging.

There is one resource that we each have a finite amount of and have to be very careful how we use it. That is our time. Could we tax consumption of time? That sounds like a very regressive poll tax on the face of it. But what if we tax only wasted time? I subscribe to the view that time is only wasted if you're spending it doing something that you don't truly enjoy. There's an easy way to tell (from a taxation point of view) if you're doing something that you don't really enjoy - somebody has to pay you to do it. If you do it without being paid, then you're probably doing it because you want to and therefore not wasting your time.

So that brings us right back around to the income tax again. There's even a justification here for making the income tax progressive - if you need to be paid more to do some piece of work, obviously it's more unpleasant and therefore should be taxed at a higher rate (yes, I know that people don't really get paid in proportion to the unpleasantness of their jobs but it could be a useful legal fiction).

There's one aspect to income taxes that I don't really like, which is casual labour. Suppose I wanted to hire the kid next door to trim my hedges - if I had to fill in a big pile of tax paperwork in order to do so I probably wouldn't bother - I'd just do it myself instead. Lots of "under the table" work goes on and many blind eyes are turned to it which is a sad state of affairs - our laws ought to reflect actual practice rather than making huge swathes of our population (technically) outlaws. How do we draw the line between which jobs should be taxed and which shouldn't?

I haven't thought through all the ramifications of this, but I have an idea that perhaps income tax ought be linked to limited liability. We give this great gift to corporations, limiting the liability of investors to only the amount of money they invest - if the corporation does something which costs society more than that, society absorbs the remainder of that cost. The idea is that by doing this we encourage entrepreneurship, which is all very well but it seems like there should be a cost to limiting liability, so perhaps income tax should only be raised when the employer is a limited liability organization. The deal ends up being the same as it currently is for such organizations, but if you don't need to limit your liability, then your employees don't need to pay income tax either - I think it's quite a neat way to delineate. On the small business end of things, there will be a population of companies with limited liability and a population without, they will compete with each other and market forces will determine the size of business at which limiting liability becomes worth it. Perhaps there could be some kind of sliding scale of liability (and therefore taxation) to ease the transition for companies growing across that boundary. I suspect I don't have the economic chops to have a good sense of how well that would work out in the real world though.

Intervals: clock and power on the same line

Thursday, October 25th, 2012

One of the main design considerations of Interval bars was trying to minimize the number of connections between adjacent bars. The current design has three connections - two for power and one which is a bidirectional, self-synchronizing data signal. The logic to synchronize everything ended up being a fairly complicated bit of code. It would be much simpler and faster if all the microcontrollers were driven from a single clock signal, so that all the signals could be synchronous. However, that would end up using 4 pins instead of 3.

What if instead of combining the clock signal with data, we combined it with power? Well, it's only worth doing this if we can do it without too much external hardware. A simple resistor/capacitor filter would be acceptable, but that's not enough to properly restore both power and clock signals. The DC part (across the capacitor) would be okay but then across the resistor we would get an AC signal, when what we really want is a square wave that oscillates between 0 and 5V, so we'd need some additional components to fix up the clock signal.

The worse problem, though, is that with each interval bar added to the circuit, we'd be adding an additional capacitor (in series with a resistor) between ground and (power+clock), so the clock signal is going to get degraded (it's amplitude will be reduced) with each bar. We can reduce this problem by increasing the resistance of the resistor, but then we'd have to use a higher voltage for the power line, increasing the complexity of the root box.

In the end I decided that a solution that makes the hardware simpler but the software more complicated was preferable, since the software only needs to be written once, but who knows how many times an interval bar will get built?

Variadic templates in ALFE

Wednesday, October 24th, 2012

Some of the types in ALFE are templates which can accept different numbers of parameters - in other words, variadic templates. How does this fit in with the Kind system I wrote about before?

As well as the Kind constructors "<>" (template) and "" (type), we need a third Kind constructor which represents zero or more Kinds in a sequence suitable for template arguments. Let's call this Kind constructor "...". As with C++ templates, we should allow the members of the sequence to have arbitrary kind. So <...> is the kind of a template that takes zero or more arguments whose kinds are type, while <<>...> is the kind of a template that takes zero or more arguments whose kinds are <> (a single argument template whose argument is of kind type). The kind of Function is <, ...> (the comma is required because Function has one or more arguments rather than zero or more). More complicated kinds like <<...>, ...> are also possible.

To actually create a variadic template, we need to implement "recursive case" and "base case" templates and have the compiler choose between them by pattern matching, just as in C++. So the definition of Tuple might look something like:

Tuple<@T, ...> = Structure { T first; Tuple<...> rest; };
Tuple<> = Structure { };

Is that enough? Well, I believe this gives ALFE's kind system parity with C++'s. However, there is one more possible Kind constructor that I can imagine - the Kind constructor which can represent any Kind at all - let's call it $. A template with an argument of this kind (i.e. has kind <$>) can be instantiated with an argument of any kind. So you could have:

Foo<@T> = { ... };
Foo<@T<>> = { ... };
Foo<@T<@R<$>>> = { ... };

The first template is used when Foo is instantiated with a type, the second is used when it's instantiated with a template of kind <> and the third is used when it's instantiated with any other single-argument template, no matter what the kind of the argument is. The third template could then pass R as an argument to another instantiation of Foo and thereby "peel apart" the kind of the argument (as long as none of the kinds involved have multiple arguments).

By combining $ with ... I think we could potentially peel apart completely arbitrary kinds. However, I'm not sure if this is worth implementing since I can't think of any use for such a thing. Still, it's good to have some idea about how to proceed if I do eventually come across such a use!

Classes vs structures in ALFE

Tuesday, October 23rd, 2012

A lot of the time in my C++ code I find myself writing class hierarchies. However, because of the way that inheritance works in C++, you need to use a pointer to point to an object which is of a particular type or some subtype of that type. In my code these pointers tend to be reference counted smart pointers to alleviate thinking about lifetime issues but the principle is the same.

Then, because I don't want to have to keep typing "Reference foo" everywhere, I encapsulate these smart pointers into value classes which provide functions that call the (virtual) functions in the implementation class. So I end up with code like this:

class Foo
{
public:
    Foo() : _implementation(new Implementation) { }
    void frob() { _implementation->frob(); }
protected:
    Foo(Implementation* implementation) : _implementation(implementation) { }
    class Implementation : public ReferenceCounted
    {
    public:
        virtual void frob() { ... }
    };
private:
    Reference<Implementation> _implementation;
};
 
class Bar : public Foo
{
public:
    Bar() : Foo(new Implementation) { }
protected:
    class Implementation : public Foo::Implementation
    {
    public:
        virtual void frob() { ... }
    private:
        ...
    };
};

That's really more boilerplate than I want to write for every class. I'd really much rather write code that looks more like C# or Java:

class Foo
{
    public Foo() { }
    public virtual void frob() { ... }
};
 
class Bar
{
    public Bar() { }
    public override void frob() { ... }
    ...
};

So I'm thinking that in ALFE I should have some syntax that looks close to the latter and behaves like the former. I'm leaning towards making "Struct" behave like C++ struct and class (i.e. a value type), and "Class" behave like a C# class (reference semantics). So:

Foo = Structure : Reference<Implementation> {
    Foo() : base(new Implementation) { }
    Void frob() { referent()->frob(); }
access<Foo>:
    Foo(Implementation* implementation) : base(implementation) { }
    Implementation = Structure : ReferenceCounted {
        virtual Void frob() { ... }
    };
};
 
Bar = Structure : Foo {
    Bar() : Foo(new Implementation) { }
access<Bar>:
    Implementation = Structure : Foo.Implementation {
        virtual Void frob() { ... }
    access<>:
        ...
    };
};

is the same as:

Foo = Class {
    Foo() { }
    virtual Void frob() { ... }
};
 
Bar = Class : Foo {
    Bar() { }
    virtual Void frob() { ... }
access<>:
    ...
};

There is a comparison to be made here between POD (Plain Old Data) and non-POD classes in C++ (though it's not quite the same because the Implementation classes would be non-POD in C++).

Obviously there's a lot of details here which would need to be fleshed out but I think something along these lines would be useful.

The Label type in ALFE

Monday, October 22nd, 2012

This is part of the ALFE types series.

The Label type holds an address which can be passed to the goto statement, and the possible values are defined by the actual labels that are in scope. It's somewhat like a function pointer, except that there are no arguments and no return value. It's based on the GCC labels as values extension, except there's a proper type for it instead of void*.

Labels in ALFE are local, which means that the scope of a label is the block in which it is defined. So you can't use goto to jump into a block, only out of one or within the current block. Since defining a variable creates a new block with the scope of that variable, this eliminates the problem of goto skipping variable constructors. Unless you stash a label into a Label variable and then execute the goto from outside the label's scope. That's just as evil as taking the address of a local variable and then accessing it via the pointer when the object is out of scope, though - don't do that.

I think these restrictions eliminate the most heinous abuses of goto, while still allowing the useful cases (which are themselves pretty rare).

The Variant type in ALFE

Sunday, October 21st, 2012

This is part of the ALFE types series.

Variant is a type that can hold any value. It's the type of all variables in a dynamically typed language. It's a bit like a sum type consisting of all other types summed together.

In practice it's probably not very useful, since to do anything with a Variant variable you have to cast it to some type, and since there's a finite number of casts that you can write in one program, you might as well just make it a sum type of those. Still, it might come in handy for translating some programs written in other languages into ALFE, or even if you just don't want to have to think too much about what the possible types are.