8086 microcode disassembled

September 3rd, 2020

Recently I realised that, as part of his 8086 reverse-engineering series, Ken Shirriff had posted online a high resolution photograph of the 8086 die with the metal layer removed. This was something I have been looking for for some time, in order to extract and disassemble the 8086 microcode. I had previously found very high resolution photos of the die with the metal layer intact, but only half of the bits of the microcode ROM were readable. Ken also posted a high resolution photograph of the microcode ROM of the 8088, which is very similar but not identical. I was very curious to know what the differences were.

I used bitract to extract the bits from the two main microcode ROMs, and also from the translation ROM which maps opcode bit patterns onto positions within the main microcode ROM.

The microcode is partially documented in US patent 4363091. In particular, that patent has source listings for several microcode routines. Within these, there are certain patterns of parts of instructions which I was able to find in the ROM dump. This allowed me to figure out how the bit patterns in the ROM correspond to the operands and opcodes of the microcode instruction set, in a manner similar to cracking a monoalphabetic substitution cipher. My resulting disassembly of the microcode ROM can be found here and the code for my disassembler is on github.

This disassembly has answered many questions I had about the 8088 and 8086. The remainder of this post contains the answers to these questions and other interesting things I found in the microcode.

What are the microcode differences between the 8086 and the 8088?

The differences are in the interrupt handling code. I think it comes down to fact that the 8086 does two special bus accesses to acknowledge an interrupt (one to tell the PIC that it is ready to service the interrupt, the second to fetch the interrupt number for the IRQ that needs to be serviced). These are word-sized accesses for some reason, so the 8088 would break them into four accesses instead of two. This would confuse the PIC, so the 8088 does a single access instead and relies on the BIU to split the access into two. The other changes seem to be fallout related to that.

Are the microcode listings in the US4363091 accurate?

Mostly. There are differences, however (which added some complexity to the deciphering process). The differences are in the string instructions. For example, the "STS" (STOSB/STOSW) instruction in the patent is:

CR  S      D      Type  a     b     F
-------------------------------------
0   IK     IND    7     F1    1
1   (M)    OPR    6     w     DA,BL
2   IND    IK     0     F1    0
3                 4     none  RNI

In the actual CPU, this has become:

0   IK    -> IND       7   F1    RPTS
1   M     -> OPR       6   w     DA,BL
2   IND   -> IK        0   NF1      5
3   SIGMA -> tmpc      5   INT   RPTI
4   tmpc  -> BC        0   NZ       1
5                      4   none  RNI

The arrow isn't a difference - I just put that in my disassembly to emphasize the direction of data movement in the "move" part of the microcode instructions. Likewise, the "F1 1" in the patent listing is the same as the "F1 RPTS" in my disassembly - I have replaced subroutine numbers with names to make it easier to read.

The version in the patent does a check for pending interrupts in the "RPTS" routine, before it processes any iterations of the string. This means that if there is a continuous "storm" of interrupts, the string instruction will make no progress. The version in the CPU corrects this, and checks for interrupts on line 3, after it has done the store, allowing it to progress. This was probably not a situation that was expected to occur in normal operation (in fact, I seem to recall crashing my 8088 and 8086 machines by having interrupts happen too rapidly to be serviced). The change was most likely done to accommodate debugging with the trap flag (which essentially means that there is always an interrupt pending when the trap flag is set). Without this change, code that used the repeated string instructions would not have progressed under the debugger.

How many different instructions does the 8086 have, according to the microcode? What are they?

The CPU has 60 instructions, and they're in a fairly logical sort of order:

(Numbers are: number of opcodes handled, size of top-level microcode routine.)

MOV rm<->r     4  3
LEA            1  1
alu rm<->r    32  4
alu rm,i       4  5
MOV rm,i       2  4
alu r,i       16  4
MOV r,i       16  3
PUSH rw        8  4
PUSH sr        4  4
PUSHF          1  4
POP rw         8  3
POP sr         4  3
POPF           1  3
POP rmw        1  6
CBW            1  2
CWD            1  7
MOV A,[i]      2  4
MOV [i],A      2  4
CALL cd        1  4
CALL cw        1  8
XCHG AX,rw     8  3
rot rm,1       2  3
rot rm,CL      2  8
TEST rm,r      2  3
TEST A,i       2  4
SALC           1  3
XCHG rm,r      2  5
IN A,ib        2  4
OUT ib,A       2  4
IN A,DX        2  2
OUT DX,A       2  2
RET            2  4
RETF           2  2
IRET           1  4
RET/RETF iw    4  4
JMP cw/JMP cb  2  6
JMP cd         1  7
Jcond         32  3
MOV rmw<->sr   2  2
LES            1  4
LDS            1  4
WAIT           1  9 (discontinuous)
SAHF           1  4
LAHF           1  2
ESC            8  1
XLAT           1  5
STOS           2  6 (discontinuous)
CMPS/SCAS      4 13 (discontinuous)
MOVS/LODS      4 11 (discontinuous)
JCXZ           1  5 (discontinuous)
LOOPNE/LOOPE   2  5
LOOP           1  4
DAA/DAS        2  4
AAA/AAS        2  8
AAD            1  4
AAM            1  6
INC/DEC rw    16  2
INT ib         1  2
INTO           1  4
INT 3          1  3

The discontinuous instructions were most likely broken up because they had bug fixes making them too long for their original slots. Similarly "POP rmw" appears to have been shortened by at least 3 instructions as there is a gap after it. Moving code around after it's been written (and updating all the far jump/call locations) would probably have been tricky.

Which instructions, if any, are not handled by the microcode?

There is no microcode for the segment override prefixes (CS:, SS:, DS: and ES:). Nor for the other prefixes (REP, REPNE and LOCK), nor the instructions CLC, STC, CLI, STI, CLD, STD, CMC, and HLT. The "group" opcodes 0xf6, 0xf7, 0xfe and 0xff do not have top level microcode instructions. So none of the instructions with 0xf in the high nybble of the opcode are initially handled by the microcode. Most of these instruction are very simple and probably better done by random logic. HLT is a little surprising - I really thought I'd find a microcode loop for that one since it only seems to check for interrupts every other cycle.

The group instructions are decoded slightly differently but the microcode routines handling them break down as follows:

INC/DEC rm        3
PUSH rm           4
NOT rm            3
NEG rm            3
CALL FAR rm       8
CALL rm           8
TEST rm,i         4
JMP rm            2
JMP FAR rm        4
IMUL/MUL rmb      8
IMUL/MUL rmw      8
IDIV/DIV rmb      8
IDIV/DIV rmw      8

Then there are various subroutines and tail calls (listed in translation.txt). Highlights:

  • interrupt handling (16 microinstructions)
  • sign handling for multiply and divide, flags for multiply (32)
  • effective address computation (16)
  • reset routine (sets CS=0xffff, DS=ES=SS=FLAGS=PC=0) (6)

Does the microcode contain any "junk code" that doesn't do anything?

It seems to! While most of the unused parts of the ROM (64 instructions) are filled with zeroes, there are a few parts which aren't. The following instructions appear right at the end of the ROM:

A     -> tmpa      5   INT   FARCALL2      011100011.0110
[  5] -> [ a]      5   UNC   INTR     F    011100011.0111

There doesn't appear to be any way for execution to reach these instructions. This code saves AL to tmpa (which doesn't appear to then be used at all) and then does either an interrupt or (if an interrupt is pending) a far call. In the interrupt case it also does a move between a source and a destination that aren't used anywhere else (and hence I have no idea what they are). This makes me wonder if there was at one point a plan for something like an "INT AL" instruction. With the x86 instruction set we ended up with, such a thing has to be done using self-modifying code, a table of INT instructions, or faking the operation of INT in software).

The following code is also inaccessible and appears to do something with the low byte of the last offset read from or written to, and the carry flag:

IND   -> tmpaL     1   LRCY  tmpc     F      01010?10?.1010

No idea what that could be for - nothing else in the microcode treats the IND register as two separate bytes.

Are there are any parts of the microcode that are still not understood?

When the WAIT instruction finishes in the non-interrupt case (i.e. by the -TEST pin going active to signal that the 8087 has completed an instruction) the microcode sequence finishes using this sequence:

                   4   [ 1]  none
                   4   none  RNI

I don't know what the "[ 1]" does - it isn't used anywhere else.

There is also a bit (shown as "Q" in the listings) which does not have an obvious function for "type 6" (bus IO) operations. This Q bit is only set for "W" (write) operations, and is differentiated in the listing by write operations without it being shown in lower case ("w"). There seems to be no pattern as to which writes use this bit. The string move instructions use it, as does the stack push for the flags when an interrupt occurs, and the push of the segment for a far call or interrupt (but not the offset). It would make sense if this bit was used to distinguish between memory and port IO bus accesses, but the CPU seems to have another mechanism for this (most likely the group decode ROM, which I have not decoded as there are too many unknowns about what its inputs and outputs are).

Are there any places where the microcode could have been improved to speed up the CPU?

Despite many of the instructions seeming to execute quite ponderously by the standards of later CPUs, the microcode appears to be very tightly written and I didn't find many opportunities for improvement. If the MOVS/LODS opcode was split up into separate microcode routines for LODS and MOVS, the LODS routine could avoid a conditional jump and execute 1 cycle faster. But there is only room for that because of the "POP rmw" shortening, which may have happened quite late in the development cycle (especially if it was a functional bug fix rather than an optimisation - optimisations might not have met the bar at that point).

There may be places where prefetching could be suspended earlier before a jump, but it's not quite so obvious that that would be an optimisation. Especially if the "suspend" operation is synchronous, and waits for the BIU to complete the current prefetch cycle before continuing the microcode program. And especially if that would make the microcode routine longer.

It would of course be possible to make improvements if the random logic is changed as well. The NEC V20 and V30 implement the same instructions at a generally lower number of cycles per instruction, but they have 63,000 transistors instead of 29,000 so probably have a much larger proportion of random logic to microcode.

Does the microcode have any hidden features, opcodes or easter eggs that have not yet been documented?

It does! Using the REP or REPNE prefix with a MUL or IMUL instruction negates the product. Using the REP or REPNE prefix with an IDIV instruction negates the quotient. As far as I know, nobody has discovered these before (or at least documented them).

Signed multiplication and division works by negating negative inputs and then negating the output if exactly one of the inputs was negative. That means that the CPU needs to remember one bit of state (whether or not to negate the output) across the multiplication and division algorithms. But these algorithms use all three temporary registers, and the internal counter, and the ALU (so the bit can't be put in the internal carry flag for example). I was scratching my head about where that bit might be kept. I was also scratching my head about why the multiplication and division algorithms check the F1 ("do we have a REP prefix?") flag. Then I realised that these puzzles cancel each other out - the CPU flips the F1 flag for each negative sign in the multiply/divide inputs! There's already an microcode instruction to check for that, so the 8086's designers just needed to add an instruction to flip it.

I was thinking the microcode instruction might set the F1 flag instead of flipping it - that would mean that you could get a (probably negated) "absolute value" operation (almost) for free with a multiply. But an almost-free negation is pretty good too - REP is a byte cheaper than "NEG AX", and with 16-bit multiplies the savings are even greater (eliminates a NEG AX / ADC DX, 0 / NEG DX) sequence. Still small compared to the multiply, but a savings nonetheless.

I contemplated using this in a demoscene production as another "we break all your emulators" moment, but multiplication and division on the 8086 and 8088 CPUs is sufficiently slow to be of limited use for demos.

The F1ZZ microcode instruction (which controls whether the REPE/REPNE SCAS/CMPS sequences terminate early) is also used in the LOOPE and LOOPNE instructions. Which made me wonder if one of the REP prefixes would also reverse the sense of the test. However, neither prefix seems to have any effect on these instructions.

Update 2nd January 2023

I've made a new version of the disassembly here incorporating some changes from the comments below. I have transcribed the group ROM, got rid of "NWB", added the RNI flag to W microinstructions, and changed XZC to ADC.

Letter to the headmistress of the school my children attend

November 29th, 2016

Dear Miss N....,

Thank you for your letter of the 28th of November regarding the Department for Education census. We are writing to let you know that we are declining to provide the countries of birth and nationalities of our children, and we have told our children to decline to provide this information to members of staff if asked. Please fill out this information as “Declined to provide” on the form, and feel free to pass on this letter by way of further explanation if this would be helpful. If our refusal to provide this information causes a penalty for the school (financial or otherwise) please let us know.

Our understanding is that, although schools are obliged to ask for this information, parents and pupils are under no obligation to provide it. Your letter suggested otherwise – if there is now a requirement on parents please let us know the details.

We understand that the Department for Education has given assurances that pupil nationality information will not be passed to the home office. However, given the recent resurgence of far-right politics in this country, we are concerned that this data could in the future be used to co-opt schools into becoming border enforcement units. It is our opinion that the surest method of avoiding this is for everyone to decline to provide this information.

On general principle, we believe that institutions in the public sector (including schools) should not have access to personal information except in cases where that information is necessary for the institution to accomplish its mission. We do not believe that any educational purpose is furthered by the school or the Department for Education having access to nationality or country of birth information. The Department for Education has stated that this information is useful for tracking progress of pupils for whom English is a second language. To this end, we have agreed that the Department of Education may be informed that English is the first language of both of our children.

Thank you for your understanding.

Yours sincerely,

Andrew & Genevieve Jenner

The mission of government

February 23rd, 2016

National governments are some of the most powerful entities in the world today. But what are their missions (or what should they be)? What are they optimizing for? Democratic governments at least are supposed to represent the interests of the voting public, but this just means that the voters get to choose between one of several parties, each of which have their own mission.

The mission of parties on the right of the political spectrum seems to be maximization of GDP (or GDP per capita). On the left, things are a bit more complicated. The labour party's ideology is roughly social democracy, but what does that actually mean? Bouncing around on Wikipedia for a bit suggests that this ideology promotes social justice, of which the key tenet seems to be reducing inequality. So if we simplify the right down to "GDP maximization" then the corresponding simplification on the left is "equality maximization". Obviously these are gross caricatures of infinitely more nuanced policy sets, but bear with me.

Suppose we are considering two independent policy changes, A and B. Change A makes the poorest person in the country poorer by £1 and makes the richest person in the country richer by £2 (a net GDP increase of £1). Change B makes the poorest person in the country richer by £1 and makes the richest person in the country richer by £2 (a net GDP increase of £3). Both of these changes increase inequality. All else being equal, it seems that conservative policy would be to support both changes (since they both increase GDP) and labour policy would be to reject both (since they both increase inequality). I think that change A is bad and change B is good, and I think a lot of people feel the same way. I would prefer maximizing a metric which is increased by change B but decreased by change A.

What's an example of such a metric? Well, one example would be "how rich is the poorest person in the country?" It seems like quite a dumb metric at first - how can you judge the performance of a entire country on the outcome for a single person? But the more I've thought about it, the more sense it seems to make. The poorest are almost by definition those who need the most effort expended to help them. It doesn't take very much money to stop that person being the poorest, at which point you switch to the new poorest person. Then you need to raise up the poorest two people to the level of the initially third-poorest and so on.

Where do you get the money to help these poor people? Well, you can take it from the rich - the "poorest person" metric doesn't care about them one way or another to a first approximation (and rightly so - the rich don't need help from the government, they can help themselves). Now, the logical extrapolation of that is that we should take all the money and redistribute it evenly - give everybody N/M where N is the total amount of wealth and M is the number of people. That is a way to make a society that is extremely egalitarian but utterly impoverished. Without any inequality at all, there is practically no incentive for people to work hard to create wealth (if you did you'd only get an infinitesimal 1/M of it). Whenever this kind of thing has been tried in the past, that is the result.

So distributing all the money evenly does not maximize the wealth of the poorest - we can improve things for them even more by allowing some inequality and redistributing some of the wealth (but leaving enough to create incentives). So applying this metric does enforce some compromise between the capitalist and socialist sides.

As written above, the anti-poverty metric is very underspecified. Over what timescale should you maximize the wealth of the poorest? If you choose a very short timescale then that implies that you should just redistribute all the wealth evenly very soon (which is very bad in the longer term) and if you choose a very long timescale then that implies that you should aim to maximize GDP and then redistribute all the wealth evenly at some very far off point in the future (which doesn't actually do anything to help people that are hungry now). Perhaps the difference between left and right politics is that they really want the same things, but just disagree on the timescales that should be involved. As for me, I suspect the ideal timescale would be something between that of a typical political term and that of a typical human lifespan, but the exact length may depend on the specific policy under consideration.

There's more that governments do more than just the redistribution of wealth (or lack thereof). How can the other functions that we'd like governments to do be justified under this metric? Well, we can do so by taking proper care about how "wealth" and "poorest" are defined. There's more to wealth than just money. Somebody who is in need of medical treatment is (all else being equal) "poorer" than somebody who isn't, so governments should provide healthcare. Somebody who is a victim of crime (or who is living in fear of becoming one due to crime going unsolved) is also impoverished. Even a certain amount of military expenditure can be justified on the grounds that getting invaded by another country would be impoverishing. The more things we include in our definition of "wealth", the more government intervention is justified by the metric.

The web with no ads

February 22nd, 2016

These days it seems like the vast majority of websites are stuffed with adverts. I'm old enough to remember a time when very few websites had ads, and when they started to appear it was a shocking and shameful development. With the rise of ads has come the rise of ad-blockers, and the rise of advertisers complaining about ad-blockers and using anti-blocking countermeasures (which always make me angry when I encounter a site that uses them). I'm definitely on the anti-ad side of the argument. In fact, I think that web browsers should act on behalf of their users rather than on behalf of website operators (on the general second law principle that computers should do what their owners tell them). So if a user wants some kind of filtering applied to the content before they look at it, the browser should comply with that and the website operator should not even get to find out about it! So a browser that's blocking ads should make exactly the same HTTP requests as a non-blocking browser would and all JavaScript that could potentially leak information back to the website operator should act as if no blocking is in place (even if that means running every script twice - once to report back to the host and once to actually render things).

If this came to pass and everyone started using it, wouldn't we lose many of the sites that make the web great? I don't think so. The web was great before ads were common and it'll still be great once they've retreated. There are two kinds of content on the web: content that people create and share because they have something to say and they want it to be heard, and content that people create in order to have something to put next to their adverts and make money. If all of the latter disappeared, I don't think I would miss it much. There would still be journalism, because there would still be stories that people want told (though perhaps without advertising to fund it, journalism as a career would become less common and those stories would be told directly by the people who want them told).

Arguably the twitters and facebooks of this world are more accessible than finding a hosting provider and installing WordPress or writing raw HTML. But even in the earliest days of the time I've been online I don't think there was ever a shortage of places that would host your content for free, especially if you had something interesting to say.

Even the best ad-blockers aren't perfect, though. Rather than packing up and going home, I think ad-supported content on the web will move to using adverts that computers can't tell are adverts - something more like product placement. Rather than being separate files, adverts will get integrated right into the desirable content, with no obvious computer-readable markers to demarcate them. Text, images, audio and video are all susceptible to this technique, and is starting to show up in the wild. At least these techniques don't lend themselves so well to "ad tech" - tons of scripts that bring the browser to a crawl, track all sorts of information about you and auction parts of your screen to the highest bidder. About all they'll be able to do with inline ads is tell that you've downloaded the media with the adverts in, and perhaps correlate that with a web search for the advertised product some time later - they won't be able to tell for sure that you saw the ads.

If these "inline" adverts start to become obnoxious then people will find a way to block these too - perhaps with audio fingerprinting or shared lists of timecode pairs that can be edited out. Editing is a bit more difficult for streaming content - if it's delivered "just in time" then removing it would leave an annoying gap. This could be solved the same way TiVo solved it for broadcast TV - you record for a while before you start playing your stream, then you can edit out adverts by just skipping forward in the recording (at least until you've caught up to real-time).

Ultimately I think advertising will have to be entertaining to survive as well as being non-obvious and inline. A good example I saw recently was in this episode of Comedians in Cars getting Coffee - at 4:30 Seinfeld is driving around looking for his product placement. Some of the other episodes have similar gags, and I can't see anybody going to the trouble of editing those out - they're too intrinsic to the show, too entertaining and not at all obnoxious.

On positive and negative reputation

February 21st, 2016

A nice feature of most places on the internet is that people can easily create a new identity (you might have to solve a captcha but that's about it). This wouldn't work so well in real life as it does on the internet - in real life if someone commits a crime they need to be held accountable for that, so it's important that we each have a real life identity that we can't just replace. Similarly, social safety-net programs need to ensure that any given person does not collect more money than they are entitled to, so they also need to use real-life identities.

Social media websites should not need to know peoples' real-life identities. But if identities can be discarded and replaced, how can we deal with the online equivalent of crimes (i.e. spam, abuse and malware)? I think the answer is just to ignore them with extreme prejudice. To decide if some message (whether it's an email, an RSS feed item, a link from an aggregator or whatever) is worth reading, we should ideally be able to look at the reputation of its originator. Completely new identities, not vouched-for by any identity with actual positive reputation would have no reputation and the messages they send should be consigned to the deepest levels of the spam filter.

Unlike in real life, there's no point in internet identities having negative reputation overall - if one did, the owner of that identity would have nothing to lose by abandoning it and spinning up a new clean one. Blacklists won't work, we'll have to use whitelists.

If you somehow grew up under a rock or something and were new to the internet, how could you build up reputation if nobody's reading your messages? Well, presumably you know some people in real life, and some of those may have some internet reputation. Contact them (offline if need be) and get them to vouch for you until you have enough reputation that strangers can read your messages.

Reputation scores should be subjective, not a single global score. So user A may have a different idea than user B about what user C's reputation is. The centralization of a global score would cause problems, and could be gamed (earning reputation from people who give it away easily and spending it where it more lucrative). My value of a reputation score for user C should be a influenced by whether I have liked their posts, and by the reputation scores for user C according to other people who who have high reputation scores according to me. It's sort of like Google's PageRank algorithm but for users instead of websites.

Abusive messages are mostly anonymous and would therefore generally have an extremely low reputation score. Otherwise, they would stand to quickly lose whatever reputation they had. So abuse is solved the same way as spam (and ends up in the same bucket).

Credit reporting agencies like Experian and Equifax keep reputation scores on our real life identities for non-crime purposes, like determining if it would be wise to lend us money. I sometimes think it would be a good idea if those companies were not allowed to use our real-life identities, so that "bad credit" could be escaped just by creating a new "credit identity". Then nobody would ever lend more money to someone than they had spent building up their credit reputation. The current system allows "no credit" young people to build up huge unsecured debts which they are then shackled with for an extremely long time. Student loan debts in the US cannot be discharged in bankruptcy, on the theory that the benefits obtained by attending college can't be given up, but this system can have some devastating consequences for those who ended up paying more for their degrees than those degrees were worth.

Cost of housing

February 20th, 2016

The high cost of housing is a big problem for a lot of people. Younger adults in particular often can't afford to own their own homes like their parents did, despite earning similar wages corrected for (non-housing) inflation. Many of those who do own have huge mortgages that will take the majority of their careers to pay off, and face having to move should they suffer a significant drop in income. Many of those who can't own are priced out of the cities and regions they'd prefer to live in by spiraling rents and may face long commutes to work in those places.

What could be done about this? Well, one idea that keeps bouncing around my head is eliminating a situation which is quite rare and very unfortunate when it does occur - the foreclosure. If mortgage lenders lose the ability to take away the homes of borrowers who stop paying their mortgages, they'll stop lending people money to buy homes. This wouldn't be like a medieval anti-usury law - people would still be able to borrow and lend money with interest, it's just that people would no longer be able to secure those loans with their homes (because they need those homes for living in).

Wait a minute, though, doesn't that sound like it would make homes less affordable rather than more? Well, perhaps at first. But why are house prices so high in the first place? They're as high as there are because that's what people are prepared to pay. And they're only able to pay that much for houses because that's what lenders will lend them. So, in the presence of mortgage lenders (and absent any other restrictions) house prices will trend upwards until the cost of servicing a mortgage is equal to all the money that can be earned by the people living in that house minus what it costs them to live (food and utility bills). Rents will rise similarly, as otherwise it would make more sense for those who would buy property to rent to instead buy mortgage notes.

If this were to come to pass, what would the consequences be? Who would be the winners and who would be the losers? Well, anyone who owns their home or a rental property free and clear (or who owes less on their mortgage than the value of the property would decrease by) would suffer sudden drop in overall wealth, as would anyone who has invested in mortgage notes. However, most property owners with an income would suddenly find that a lot more of it would be disposable (no point making any more mortgage payments!) so there would be a massive influx of cash into the economy. Pretty much everyone but the aforementioned owners and investors would be better off. In other words, it would be a massive redistribution of wealth from the rich to the poor.

What about those who do not own their own property but instead rent? Most such people probably don't have large amounts of savings, so wouldn't be able to buy a house outright even at the new lower prices (if they could they would probably have enough for a mortgage down-payment under the current rules). So they will have to continue to rent. Since there's no money in mortgage notes anymore, buying rental properties becomes a relatively more lucrative investment. So those with the money to do so buy up all the (newly cheap) properties for rental. This (combined with the lack of mortgages), prices would-be homeowners out of the market again and forcing them to continue renting. New builds become less lucrative (since they can't be sold for as much) so the supply of housing goes down, which drives up rents to the limits of what the market can bear. So while getting rid of mortgages would help most current homeowners, everyone else would be screwed in the slightly longer term. Our little "tweak" has actually had the opposite of the desired effect! Not only is housing as expensive as ever, even more people are renting instead of owning.

We can fix this with another small adjustment to the rules - instead of just getting rid of foreclosures, get rid of evictions altogether. Everyone who is renting their home becomes the de-facto owner of that property. Sorry to the rental property owners, you're screwed right along with the mortgage note owners.

How would this work in practice? You can't evict someone from their home, but how do you define what "home" is for the purpose of this law? What if someone has multiple houses? I think my preferred implementation would probably be something like this: each person would be permitted to enter into a particular database a single address, which is the place that they are not allowed to be evicted from (their home address). People who don't like being in government databases don't have to be in this one, but would lose the benefit of having a home that they can't be evicted from. At the point of implementation of the law, you must have some claim on an address (i.e. ownership, rental agreement, or be a dependent of someone with an ownership or rental agreement, or have some reasonable evidence of occupancy) in order to make it your home. There would probably have to be quite a bit of wrangling and untangling to determine who should really have the right to live where. However, once the implementation is complete things become much simpler. From that point on, you'd only be permitted to change your home address to a new one with the permission (by power of attorney if necessary) of all living people who also claim that new address as home. This means that an inhabitable property that doesn't have anybody claiming it as their home could be claimed by anyone - a sort of "squatter's right" for this new regime.

Buying and selling property would be fine, as would any other contracts involving a change of home, with the limitation that at every point in the execution of a contract, every person has a home. So a new rental agreement or mortgage agreement would not be possible because there is a point in the execution of those contracts (i.e. non-payment of rent or mortgage) in which someone ends up without a home. Now in this form there is something very similar to a mortgage contract, namely "if you don't pay your mortgage, you have to change your home to this hovel in the middle of nowhere". It's a de facto eviction if not a de jure one. So, I think contracts involving home changes would have to not extend over time - they would have to specify a single point in time at which point all the home changes happen, and a home for every person involved afterwards. If one of those addresses turns out not to exist, then the entire contract would be void and everything would have to be put back the way it was at the start. If a contract did extend over time, the "hovel in the middle of nowhere" might turn out not to exist, and if too much time has passed then restoring the conditions at the start of the contract may also be impossible.

So in this world the concept of a chain becomes more important when buying and selling property. Every non-circular chain must begin with the splitting of a household or the destruction of an address, and end with the joining of households into a single address or the construction of a new address. "Construction" and "destruction" here do not necessarily mean physical housebuilding and demolition - it could be doing up a previously uninhabitable property or a previously habitable property falling into disrepair - both with the consequent changes to the "home" database.

The existence of "uninhabitable" property not listed in the "home" database presents a potential problem - a "shadow economy" of people living in "uninhabitable" properties with mortgages or rental agreements and all the accompanying baggage, with their "home" addresses (if any) pointing at some undesirable location. Perhaps this could be solved by forbidding people from living in addresses not in the "home" database and/or forcing property that is found to actually be inhabitable into the database. Forcing real estate to be in a database seems less onerous than forcing people who don't want to be in a database to be in one, but people not in the database may find it difficult to find housing if non-database housing is off limits. They'd have to live with somebody who is in the database.

Sometimes houses burn down, or the people living in them have a baby that there isn't room for, or a housemate turns out to be abusive, or any one of many other circumstances in which the supply of housing is insufficient for the people needing it. For these circumstances I think a basic minimum standard of social housing would be a necessary fallback when there's nowhere else to go. As a general rule people would not be expected to live in this high-density, no-frills accommodation indefinitely - it's just a stopgap measure until they can accumulate enough cash to buy their own property. It needs to be sufficient so that people living there can hold down a job (good night's sleep, sufficient facilities to remain clean and fed, no worrying about security) but doesn't need to be fancier than that.

A little care would need to be taken in the areas of hotel rooms, holiday lets and other short-term accommodation. Perhaps these things can be handled by saying something like "if someone stays somewhere for more than 183 nights in a given calendar year, that place is deemed inhabitable and can be claimed as home by that person", but we'd still need some way to allow people to go on holiday without making it possible for property owners to exploit the poor by making them rent and move every 4 months. There are other rough edges as well (what about people moving internationally?) but I'm confident these could be solved.

This all seems so sensible to me. It's too bad there's no apparent way to get from the grossly unjust system we have at the moment to this one - there are just too many powerful people who would stand to lose so much money that they would do everything they could to oppose it.

Integers as types

February 19th, 2016

In C++ there are two kinds of things you can use as template arguments. One is the obvious one, types, but less well known is that integers (and some other values) can also be used. Early on in the design of ALFE's type system I decided to simplify things by having template arguments all be type constructors. If you really need to use an integer (or indeed any other value) as a template argument you can wrap it in a type.

However, I recently thought of another feature I'd like ALFE fixed-length arrays to have - the ability to be indexed by a type other than Integer. For example, we might want to have an array indexed by a Boolean:

Boolean baz;
...
Foo bar = barData[baz];

Now we could just convert the Boolean to an Integer when indexing the array:

Foo bar = barData[baz ? 1 : 0];

But that seems ugly especially if we're indexing barData in a lot of places. We could turn it into a helper method but then it's less obvious to callers that it's an array access. Also this is rather more unwieldy if our array is indexed by an Enumeration type.

What is the name of the type of an array that takes Boolean and returns Foo? By analogy with the function type Foo(Boolean) we would expect it to be Foo[Boolean] which is quite nice (though makes the sequence type [Boolean] even more irregular - maybe that should be renamed as Boolean[]?)

So if Foo[Boolean] is a array of Foo indexed by Boolean then Foo[2] should be an array of Foo indexed by a type called "2", which must then be an integral type with values 0 and 1. So perhaps integers are types after all. Integer literal type names can't be used everywhere that normal type names can occur, though - it would lead to too many parsing ambiguities. So we'd need another name for the type "2", perhaps "LessThan<Class{Int n=2;}>", "()[2].Indexer" or "Zero.Successor.Successor".

Of course, if you want that you'll almost certainly also want ranges. We can kill both those birds (and many others) while still keeping the same meaning for Foo[2] by creating difference types: "A-B" is a type whose values are those that are in A but not in B. So the type "3-1" would allow the values 1 and 2. Unfortunately 3-1 is not the same as 2 (though the two types do have the same cardinality).

Alternatively, the idea that the type "2" should be a type with a single value (the value 2) also has some beauty especially when combined with sum types so that you could create a type that can contain some arbitrary subset of integers and have that checked at compile time.

Nobody at the controls

February 18th, 2016

Several very smart people are worried about artificial intelligences becoming smarter than people, taking over the world, and not necessarily having our best interests at heart. A specific concern is that some form of paperclip maximizer might optimize the world for something that is harmful to us if the optimization in question is taken too far.

I once tried to explain this to my grandma, who replied "if a computer is taking over the world, why can't you just unplug it?" A very sensible plan, if the computer in question isn't already running a lot of critical functions, without which people would die and society would devolve into chaos (and if it doesn't have batteries). Obviously we'd need to put a stop to something like that before it got started.

Except we can't - it's already too late. We already have a "paperclip maximizer" running the world. It's not optimizing for paperclips, though - it's optimizing for profit: GDP (on the level of nations), shareholder value (on the level or public companies) and personal income (on the level of individuals, especially those with bills to pay). No artificial intelligence is required - the "machine" uses the intelligence of the people that comprise it to do its thinking. But all of those people answer to someone else - employees answer to their managers, CEOs answer to their customers (usually) or investors, most investors are just trying to get the best returns so that they can retire as quickly and comfortably as possible, and consumers are just looking for the best performance/price ratios. Elected representatives answer to voters in their constituencies or lobbyists for various special-interest groups, mostly industries. Voters all too often vote for who the media tells them to, or who seems more likely to ensure that they can find work or keep their jobs. There's nobody in charge of the whole thing, putting the brakes on to ensure that profits don't come before the freedom and welfare of individual humans or humanity as a whole. There's no "plug to be taken out" short of a massive revolution which would also dismantle the systems that keep us fed, clothed, warm, clean, secure, healthy and entertained.

Taken to it's logical conclusion, what does a world fully optimized for profit look like? Well, predictions become a whole lot more difficult once super-intelligent AIs are in the picture, but here's what I think it would be if such things turned out not to be possible. As there are things that humans can do that machines can't, most human effort would go into profit-generating activities (in other words, we'd all be wage-slaves). If someone is rich enough not to have to work then they probably wouldn't, so the generated profits can't go to making a lot of people rich - this implies that almost all the generated wealth would be concentrated in the hands of a very small number of individuals. Most workers would probably have quite a lot of debt since if they could have an unchecked accumulation of capital they could get rich enough not to have to work. Social safety-nets for any but those absolutely unable to do any kind of profit-generating work would be dismantled since the more dire the consequences for not working the more people will work. Education of children would focus on those skills needed in the workplace, and pure research would only be funded to the extent that it could eventually lead to more profit. Retirement would probably not be a thing (or if it is, you would not expect to have a lot of quality post-retirement time).

It's not all bad, though. Unemployment would be very low (since if someone could do useful work, there's no value in letting them go idle). Amongst the employed, extreme poverty would be nonexistant as people who starving and/or overly stressed about lack of money aren't able to work as effectively. The standard of healthcare would be good (since getting sick makes you unavailable to work and dying means money spent training your replacement) but may focus more on expensive and continual management of disease rather than cures (since this can be paid for by the worker as another incentive to work). War, crime, and political instability would be non-existent as overall these things destroy wealth rather than creating it. As most of the work that can't be done by machines is intellectual in nature, workers are likely to be well-educated, working conditions are likely to be very good and workers would get as much vacation and leisure time as needed to keep them from burning out and to maximize their overall productivity. Time spent commuting is wasted, so people will tend to live close to their workplaces. Violent revolution would be bad for business, so the general standard of living would be good enough that people would not expect to be able to improve it by revolting. Entrepreneurship would be encouraged (and funded by the wealthy) as long it is profitable overall.

On getting to that point, the amount of wealth being generated will surely far exceed that needed to meet the basic needs of workers and keep them productive. The excess goes to the super-wealthy, but what would they do with it? They would be the few who have the luxury of being motivated by concerns other than profit, so depending on their interests they might invest in the arts, space travel, curing diseases and other ills of the world that are left unaddressed by profitability concerns, or blue skies research. Or perhaps they will just build gigantic monuments to their own egos that will be enjoyed by subseqeuent generations of tourists.

That actually sounds less awful than I thought it would be when I started writing this essay. It's still all rather unjust, though - ideally the excess profits would be distributed far more equitably so that more people have a say in what the priorities of the human race should be. I hope we can find a way to keep the good parts of this scenario while replacing the bad parts. In a future post I'll look more into how this might be achieved.

Hardware support for ravioli

February 17th, 2016

Ravioli memory is something that could benefit significantly from hardware support. Page operations and inter-thread communication are some specific things that could be accelerated, but there are others as well:

  • Dereferencing through reverse-ravioli trees (and forward-ravioli trees to find parent objects).
  • Allocating and deallocating/moving. The latter requires fixing up any pointers than happen to be in registers, so the set of registers that hold pointers needs to be found somehow (either by using tag bits or having a subset of registers dedicated to pointers).
  • Prefetching the memory for the ravioli pointed to by registers, the ravioli pointed to by those ravioli and so on. Since there are a relatively small number of memory locations that could be accessed at any moment in time. This is in contrast to current CPU architectures, where pretty much any address could be generated at any point in time.

Registers and other ravioli are two places where pointers can be found and which need to be fixed up. Are there any others? What about the stack? Well, searching the entire stack for pointers to the raviolo that's getting moved is an O(n) operation, which we want to avoid. So we can't be storing pointers on the stack.

One thing about stacks as implemented in modern OSes - you have to decide how big they can get in advance. With virtual memory systems you don't have to commit storage for the entire stack, but you do have set some kind of limit on it so that heaps, dynamically-loaded libraries and stacks belonging to other threads can go below them (assuming a downward growing stack). If you choose a limit that is too small you'll get a stack overflow, and choosing a limit that is too large is wasting address space.

Both of these problems could be solved by not having a separate stack at all, and storing activation frames in the ravioli. This eliminates the distinction between stack and heap data, and allows for trivial implementation of Cactus Stacks.

What other chunks of memory can we shoehorn into ravioli? What about the actual code that the CPU runs? Code loading and unloading less often causes fragmentation than heap allocation/deallocation, but it could happen (and might be more likely with dynamic code generation). Putting code into ravioli has some other nice effects as well - it means that ravioli addresses can be embedded directly into code, and those addresses can be updated automatically when those ravioli move around - it's just like any other reference from one raviolo to another. Unless your code raviolo ends with an instruction like "jump to the address in this register" or "return from subroutine" or "halt the CPU", it will need a successor raviolo to jump to when it reaches the end. So all such code ravioli must end with a pointer to the next code raviolo to run, in the tradition of the Royal McBee RPC-4000. It could also end with two addresses for a conditional jump instruction. Some instructions will need a third address as data (e.g. load an immediate address into an address register). Code ravioli are sort of like basic blocks except for all being the same size. They may contain several actual CPU instructions.

Along with code come the vtable pointers previously mentioned. These are a bit like code (in that they are normally generated when the program is compiled) but are not executable. My thinking was that each raviolo would have a pointer's worth of metadata which tells both the CPU and the program what it's looking at. However, the vtables (if not stored as ravioli) are another potential source of fragmentation. Worse, if every single "just data" raviolo has to dedicate 1/4 of its space to a vtable pointer it's quite wasteful. Getting rid of the vtable pointer takes the limiting-case overhead for straight arrays down from 300% to 167% (binary trees), 100% (ternary trees) or 31% (7-child trees with ravioli that are 8 pointers long). Vtable pointers can still be implemented by compilers in the normal way (as a normal pointer field in a structure).

The CPU will still need some metadata about each raviolo to know where the pointers are and whether it's a reference tree raviolo that should be skipped for dereferencing. This is only a few bits worth, however - I've counted 12 different types of ravioli that are the size of 4 pointers:

  • 1 incoming pointer
  • 1 incoming pointer, 1 outgoing pointer
  • 2 incoming pointers, 1 outgoing pointer
  • 1 incoming pointer, 2 outgoing pointers
  • 2 incoming pointers
  • 3 incoming pointers
  • 4 incoming pointers
  • 3 incoming pointers, 1 outgoing pointer
  • 2 incoming pointers, 2 outgoing pointers
  • 1 incoming pointer, 3 outgoing pointers
  • 1 incoming pointer, 3 outgoing pointers - reference only (reverse tree)
  • 3 incoming pointers, 1 outgoing pointer - reference only (forward tree)

Suppose our pointers are 64 bits, so our ravioli are 256 bits and aligned to 32 byte boundaries. Then we know that the low 5 bits of our incoming pointer will all be zero, so we can reuse them for these type bits. The low 5 bits of outgoing pointers might still be useful for pointing to particular bytes inside ravioli (especially for string handling). Of course, for finding matching pointers for the purposes of moving ravioli, these low bits would be ignored.

There we have it - everything raviolized. Except for the CPU registers - there's no possibility of fragmentation there. In fact, one could think of the register file as one extra large and special raviolo that is unique in not having an incoming pointer (it's location is implicit) and whose outgoing pointers don't need to be reference-enumerated (since they're automatically updated when the ravioli they point to move anyway).

Another advantage that a machine implementing ravioli in hardware might have is the possibility of isolating threads from each other even if they share an address space. The CPU knows where all the pointers are and can control access to the places where they are so that only a valid pointer can be placed in a pointer slot. Thus a thread can't get access to data unless it already has a pointer to that data somewhere. In particular, a thread can't access another thread's data. So we can use the same address space for all processes, and we no longer need to distinguish between threads in the same process and threads in another process. The OS keeps a mapping from (non-pointer) thread ID numbers to the initial page of each thread, so if a thread needs to refer to another thread it can do so via this thread ID.

Threading and ravioli

February 16th, 2016

One thing that's been bothering me about ravioli memory is the possibility that it doesn't interact very well with threading. Trying to have multiple threads manipulating a single ravioli heap is a non-starter: whenever one thread needs to allocate or deallocate a raviolo (a very common operation) all other threads would need to be suspended and (in the deallocation case) have any pointers in their registers potentially adjusted.

So rather than having a single heap for all threads, we should instead have a heap per thread. But one thread may not refer to another thread's heap! If it did, then again we've have to synchronize those threads whenever a deallocation occurred.

So essentially when using ravioli, every thread must have a separate address space (logically, even if multiple threads share the same address space in the traditional sense). But that kind of defeats the point of having separate threads in the first place! The whole point of threads is that they share an address space, so that multiple threads can party on the same data without explicitly transferring it from one thread to another.

Threads are the standard way to build interactive applications that remain responsive when the doing some background processing (or network access, disk access, etc.). They've also become increasingly important as the exponential rise in single-core performance has given way to a rise in the number of hardware threads per CPU. They have low overhead (switching threads can be very quick since the address space doesn't need to change) and are reasonably easy for operating systems to implement.

But that doesn't necessarily mean they're the best way to structure applications to take best advantage of a CPU's resources. There is a serious hidden penalty to threads that applies even when using the most sophisticated lock-free techniques and latest multi-core CPUs. That penalty is this: sharing memory between cores is not free. If a piece of memory is written to from one core and read from another, the caches that make these memory accesses fast must be flushed. So memory accesses become a great deal slower when there is contention between cores for the cache lines in question.

So, to get best performance out of multi-threaded code, the threads should work as much as possible with memory that is private to them and share as little memory between threads as possible, in other words they should avoid being "chatty".

With ravioli, communication between threads essentially amounts to message passing. One thread passes a message consisting of some data structure to another thread. To accomplish that, the ravioli comprising that data structure are moved into some shared-memory buffer, ownership of that buffer is transferred to the other thread and then the marshalling ravioli are moved on to the other thread's heap. That sounds like a lot of extra work but the extra transfers each happen within a single thread and seem likely to be quick compared to all the cache-flushing involved in the actual moving of information from one core to another (though I should qualify that I haven't actually measured it). In fact, the localization of this movement (in space and time) may even have some performance benefits over the ad-hoc chattiness of most multi-threaded programs.

How can multiple threads share a single address space in the traditional sense? One possible way would be to divide up the address space into pages of (say) 4KB (we might want to experiment with different values to see what gives the best tradeoff of performance to per-thread memory overhead, but there are likely to be advantages in choosing a size which corresponds to the natural page size of the machine's MMU). Whenever a thread needs to allocate a new raviolo, it uses space from the current page. If there is no more space left on the current page, it requests a new page (which involves coordination with the other threads). If a thread has two entire pages that are completely empty, one of them is returned to the inter-thread pool of empty pages (which again requires coordination). This hysteresis prevents excess inter-thread coordination when a thread's memory usage is close to a whole number of pages. A thread's pages are linked together using the first raviolo from each page (which is reserved for that purpose), as are the pages in the free list. So a thread's pages do not have to be contiguous in the address space and fragmentation is not a problem.

These pages also form a natural mechanism for the "shared memory buffers" used for inter-thread communication, alluded to above. It's pretty straightforward to move a page from one thread to another (just remove it from one thread's list and place it on another's). The one tricky part is making sure that only the ravioli that actually need to be moved to another thread are placed in the page to be moved. So a thread will need a way of allocating a page separate from it's normal heap-top. This page doesn't get its ravioli compacted and isn't automatically freed if there is too much unused space. A shared memory buffer is almost like the address space of a different thread altogether (except that there isn't a corresponding thread of CPU execution). Once the buffer is placed on the other thread, a compacting operation is needed to restore the "no gaps" invariant - ravioli from the end of the new buffer are moved into any previously free space, after which the previously shared buffer is just normal memory.

It occurred to me that these special buffers may also be useful for serializing to or deserializing from disk, network, or the address spaces of other processes. In these cases the high bits of the pointers (the ones corresponding to the page address) need to be fixed up to point to the real pages on deserialization. But perhaps this method of serialization would be overly wasteful - for all but the largest regions of data, most of those pointer bits are useless to the deserializer. So perhaps it would make more sense to compress the data before sending it to remove this redundant information.