8086 microcode disassembled

Recently I realised that, as part of his 8086 reverse-engineering series, Ken Shirriff had posted online a high resolution photograph of the 8086 die with the metal layer removed. This was something I have been looking for for some time, in order to extract and disassemble the 8086 microcode. I had previously found very high resolution photos of the die with the metal layer intact, but only half of the bits of the microcode ROM were readable. Ken also posted a high resolution photograph of the microcode ROM of the 8088, which is very similar but not identical. I was very curious to know what the differences were.

I used bitract to extract the bits from the two main microcode ROMs, and also from the translation ROM which maps opcode bit patterns onto positions within the main microcode ROM.

The microcode is partially documented in US patent 4363091. In particular, that patent has source listings for several microcode routines. Within these, there are certain patterns of parts of instructions which I was able to find in the ROM dump. This allowed me to figure out how the bit patterns in the ROM correspond to the operands and opcodes of the microcode instruction set, in a manner similar to cracking a monoalphabetic substitution cipher. My resulting disassembly of the microcode ROM can be found here and the code for my disassembler is on github.

This disassembly has answered many questions I had about the 8088 and 8086. The remainder of this post contains the answers to these questions and other interesting things I found in the microcode.

What are the microcode differences between the 8086 and the 8088?

The differences are in the interrupt handling code. I think it comes down to fact that the 8086 does two special bus accesses to acknowledge an interrupt (one to tell the PIC that it is ready to service the interrupt, the second to fetch the interrupt number for the IRQ that needs to be serviced). These are word-sized accesses for some reason, so the 8088 would break them into four accesses instead of two. This would confuse the PIC, so the 8088 does a single access instead and relies on the BIU to split the access into two. The other changes seem to be fallout related to that.

Are the microcode listings in the US4363091 accurate?

Mostly. There are differences, however (which added some complexity to the deciphering process). The differences are in the string instructions. For example, the "STS" (STOSB/STOSW) instruction in the patent is:

CR  S      D      Type  a     b     F
-------------------------------------
0   IK     IND    7     F1    1
1   (M)    OPR    6     w     DA,BL
2   IND    IK     0     F1    0
3                 4     none  RNI

In the actual CPU, this has become:

0   IK    -> IND       7   F1    RPTS
1   M     -> OPR       6   w     DA,BL
2   IND   -> IK        0   NF1      5
3   SIGMA -> tmpc      5   INT   RPTI
4   tmpc  -> BC        0   NZ       1
5                      4   none  RNI

The arrow isn't a difference - I just put that in my disassembly to emphasize the direction of data movement in the "move" part of the microcode instructions. Likewise, the "F1 1" in the patent listing is the same as the "F1 RPTS" in my disassembly - I have replaced subroutine numbers with names to make it easier to read.

The version in the patent does a check for pending interrupts in the "RPTS" routine, before it processes any iterations of the string. This means that if there is a continuous "storm" of interrupts, the string instruction will make no progress. The version in the CPU corrects this, and checks for interrupts on line 3, after it has done the store, allowing it to progress. This was probably not a situation that was expected to occur in normal operation (in fact, I seem to recall crashing my 8088 and 8086 machines by having interrupts happen too rapidly to be serviced). The change was most likely done to accommodate debugging with the trap flag (which essentially means that there is always an interrupt pending when the trap flag is set). Without this change, code that used the repeated string instructions would not have progressed under the debugger.

How many different instructions does the 8086 have, according to the microcode? What are they?

The CPU has 60 instructions, and they're in a fairly logical sort of order:

(Numbers are: number of opcodes handled, size of top-level microcode routine.)

MOV rm<->r     4  3
LEA            1  1
alu rm<->r    32  4
alu rm,i       4  5
MOV rm,i       2  4
alu r,i       16  4
MOV r,i       16  3
PUSH rw        8  4
PUSH sr        4  4
PUSHF          1  4
POP rw         8  3
POP sr         4  3
POPF           1  3
POP rmw        1  6
CBW            1  2
CWD            1  7
MOV A,[i]      2  4
MOV [i],A      2  4
CALL cd        1  4
CALL cw        1  8
XCHG AX,rw     8  3
rot rm,1       2  3
rot rm,CL      2  8
TEST rm,r      2  3
TEST A,i       2  4
SALC           1  3
XCHG rm,r      2  5
IN A,ib        2  4
OUT ib,A       2  4
IN A,DX        2  2
OUT DX,A       2  2
RET            2  4
RETF           2  2
IRET           1  4
RET/RETF iw    4  4
JMP cw/JMP cb  2  6
JMP cd         1  7
Jcond         32  3
MOV rmw<->sr   2  2
LES            1  4
LDS            1  4
WAIT           1  9 (discontinuous)
SAHF           1  4
LAHF           1  2
ESC            8  1
XLAT           1  5
STOS           2  6 (discontinuous)
CMPS/SCAS      4 13 (discontinuous)
MOVS/LODS      4 11 (discontinuous)
JCXZ           1  5 (discontinuous)
LOOPNE/LOOPE   2  5
LOOP           1  4
DAA/DAS        2  4
AAA/AAS        2  8
AAD            1  4
AAM            1  6
INC/DEC rw    16  2
INT ib         1  2
INTO           1  4
INT 3          1  3

The discontinuous instructions were most likely broken up because they had bug fixes making them too long for their original slots. Similarly "POP rmw" appears to have been shortened by at least 3 instructions as there is a gap after it. Moving code around after it's been written (and updating all the far jump/call locations) would probably have been tricky.

Which instructions, if any, are not handled by the microcode?

There is no microcode for the segment override prefixes (CS:, SS:, DS: and ES:). Nor for the other prefixes (REP, REPNE and LOCK), nor the instructions CLC, STC, CLI, STI, CLD, STD, CMC, and HLT. The "group" opcodes 0xf6, 0xf7, 0xfe and 0xff do not have top level microcode instructions. So none of the instructions with 0xf in the high nybble of the opcode are initially handled by the microcode. Most of these instruction are very simple and probably better done by random logic. HLT is a little surprising - I really thought I'd find a microcode loop for that one since it only seems to check for interrupts every other cycle.

The group instructions are decoded slightly differently but the microcode routines handling them break down as follows:

INC/DEC rm        3
PUSH rm           4
NOT rm            3
NEG rm            3
CALL FAR rm       8
CALL rm           8
TEST rm,i         4
JMP rm            2
JMP FAR rm        4
IMUL/MUL rmb      8
IMUL/MUL rmw      8
IDIV/DIV rmb      8
IDIV/DIV rmw      8

Then there are various subroutines and tail calls (listed in translation.txt). Highlights:

  • interrupt handling (16 microinstructions)
  • sign handling for multiply and divide, flags for multiply (32)
  • effective address computation (16)
  • reset routine (sets CS=0xffff, DS=ES=SS=FLAGS=PC=0) (6)

Does the microcode contain any "junk code" that doesn't do anything?

It seems to! While most of the unused parts of the ROM (64 instructions) are filled with zeroes, there are a few parts which aren't. The following instructions appear right at the end of the ROM:

A     -> tmpa      5   INT   FARCALL2      011100011.0110
[  5] -> [ a]      5   UNC   INTR     F    011100011.0111

There doesn't appear to be any way for execution to reach these instructions. This code saves AL to tmpa (which doesn't appear to then be used at all) and then does either an interrupt or (if an interrupt is pending) a far call. In the interrupt case it also does a move between a source and a destination that aren't used anywhere else (and hence I have no idea what they are). This makes me wonder if there was at one point a plan for something like an "INT AL" instruction. With the x86 instruction set we ended up with, such a thing has to be done using self-modifying code, a table of INT instructions, or faking the operation of INT in software).

The following code is also inaccessible and appears to do something with the low byte of the last offset read from or written to, and the carry flag:

IND   -> tmpaL     1   LRCY  tmpc     F      01010?10?.1010

No idea what that could be for - nothing else in the microcode treats the IND register as two separate bytes.

Are there are any parts of the microcode that are still not understood?

When the WAIT instruction finishes in the non-interrupt case (i.e. by the -TEST pin going active to signal that the 8087 has completed an instruction) the microcode sequence finishes using this sequence:

                   4   [ 1]  none
                   4   none  RNI

I don't know what the "[ 1]" does - it isn't used anywhere else.

There is also a bit (shown as "Q" in the listings) which does not have an obvious function for "type 6" (bus IO) operations. This Q bit is only set for "W" (write) operations, and is differentiated in the listing by write operations without it being shown in lower case ("w"). There seems to be no pattern as to which writes use this bit. The string move instructions use it, as does the stack push for the flags when an interrupt occurs, and the push of the segment for a far call or interrupt (but not the offset). It would make sense if this bit was used to distinguish between memory and port IO bus accesses, but the CPU seems to have another mechanism for this (most likely the group decode ROM, which I have not decoded as there are too many unknowns about what its inputs and outputs are).

Are there any places where the microcode could have been improved to speed up the CPU?

Despite many of the instructions seeming to execute quite ponderously by the standards of later CPUs, the microcode appears to be very tightly written and I didn't find many opportunities for improvement. If the MOVS/LODS opcode was split up into separate microcode routines for LODS and MOVS, the LODS routine could avoid a conditional jump and execute 1 cycle faster. But there is only room for that because of the "POP rmw" shortening, which may have happened quite late in the development cycle (especially if it was a functional bug fix rather than an optimisation - optimisations might not have met the bar at that point).

There may be places where prefetching could be suspended earlier before a jump, but it's not quite so obvious that that would be an optimisation. Especially if the "suspend" operation is synchronous, and waits for the BIU to complete the current prefetch cycle before continuing the microcode program. And especially if that would make the microcode routine longer.

It would of course be possible to make improvements if the random logic is changed as well. The NEC V20 and V30 implement the same instructions at a generally lower number of cycles per instruction, but they have 63,000 transistors instead of 29,000 so probably have a much larger proportion of random logic to microcode.

Does the microcode have any hidden features, opcodes or easter eggs that have not yet been documented?

It does! Using the REP or REPNE prefix with a MUL or IMUL instruction negates the product. Using the REP or REPNE prefix with an IDIV instruction negates the quotient. As far as I know, nobody has discovered these before (or at least documented them).

Signed multiplication and division works by negating negative inputs and then negating the output if exactly one of the inputs was negative. That means that the CPU needs to remember one bit of state (whether or not to negate the output) across the multiplication and division algorithms. But these algorithms use all three temporary registers, and the internal counter, and the ALU (so the bit can't be put in the internal carry flag for example). I was scratching my head about where that bit might be kept. I was also scratching my head about why the multiplication and division algorithms check the F1 ("do we have a REP prefix?") flag. Then I realised that these puzzles cancel each other out - the CPU flips the F1 flag for each negative sign in the multiply/divide inputs! There's already an microcode instruction to check for that, so the 8086's designers just needed to add an instruction to flip it.

I was thinking the microcode instruction might set the F1 flag instead of flipping it - that would mean that you could get a (probably negated) "absolute value" operation (almost) for free with a multiply. But an almost-free negation is pretty good too - REP is a byte cheaper than "NEG AX", and with 16-bit multiplies the savings are even greater (eliminates a NEG AX / ADC DX, 0 / NEG DX) sequence. Still small compared to the multiply, but a savings nonetheless.

I contemplated using this in a demoscene production as another "we break all your emulators" moment, but multiplication and division on the 8086 and 8088 CPUs is sufficiently slow to be of limited use for demos.

The F1ZZ microcode instruction (which controls whether the REPE/REPNE SCAS/CMPS sequences terminate early) is also used in the LOOPE and LOOPNE instructions. Which made me wonder if one of the REP prefixes would also reverse the sense of the test. However, neither prefix seems to have any effect on these instructions.

Update 2nd January 2023

I've made a new version of the disassembly here incorporating some changes from the comments below. I have transcribed the group ROM, got rid of "NWB", added the RNI flag to W microinstructions, and changed XZC to ADC.

54 Responses to “8086 microcode disassembled”

    • Ryan says:

      Hi, I know this is quite a bit later, but I'm writing an emulator for the 8088 that interprets the microcode itself. However, I seem to have hit a wall for the shift rm8,cl instructions. How does the CPU determine where in the microcode to jump for these instructions. I vaguely understand the bottom three bits of the address are set to modrm.reg, but that would jump to the middle of a microcode routine, wouldn't it?

      Thanks,
      Ryan

  1. David says:

    Hey Reenigne!

    Thank you so much! I'm not sure how to submit corrections to the ZIP documentation, but I have two suggestions: the unknown difference between w and W micro-instructions in could be that the lower case w ops do not terminate the instruction and W does terminate. The unknown bit seems to indicate a hidden RNI. I'm not 100% sure, but reading through the microcode this seems to hold up. The other suggestion is key.txt line 50 col 78, p could be renamed i to match the bitfield description.

    • Nick says:

      Hi David. Your suggestion for the use of lower/upper case 'w' seems to indeed make the most sense.

      There is another thing I don't quite understand and that is how bus write is done when writing only a byte vs a word. Is it perhaps opcode dependent?

      • David says:

        Hey Nick,

        For most opcodes with a modregrm byte, the W bit in the opcode determines whether the instruction uses a byte or word. (Other opcodes I've hard-coded an override to the W bit as needed. The 8086 Family Users Manual page 4-19, figure 4-20 for the W opcode bit).

        The microcode then uses the internal "W" flag to determine whether or not to jump during the L8 instruction, how many bits to set in the MAXC instruction, and the size of read/write requests to the BIU. In my emulator the BIU has an identical W flag copied from the EU when a bus request is made or a word-aligned fetch can happen.

        One interesting thing (not about bus ops) is it appears that when tmpbL is loaded it must be sign extended to tmpb for many of the ALU ops to work. Still researching this...

  2. Zir Blazer says:

    There is a thing that, for science, should be extremely interesing to analyze: How does the Microcode compares between different 8086 Steppings/Revisions? I recall reading somewhere (Don't really remember where) that the original 8086 had a major Interrupt bug that was quickly fixed in newer versions. Since what you're analyzing is a 8086-2 that according to Wikipedia was launched a year and half later, and the bug is supposed to affect only the earliest 8086 units, chances are that it was fixed by then.
    That bug may be why you mentioned than the Microcode in the patent doesn't match the Processor, it may be possible than in the earliest 8086s it is actually identical and the Interrupt bug was related to that Interrupt storm you mention. I don't remember more details than those to help you google in case you want to confirm that for yourself, but I'm not satisfied in that specific topic.

    Sadly, early 8086s are supposed to be collector items and I don't think a lot of people are willing to do a destructive test with those...

  3. Michael K says:

    This is amazing work. Thank you for publishing it. It's contributing to my thought on a project I've been kicking around for a while.

    I have a question about some of the labels.

    1) In the EA section, there are unconditional jumps to EAOFFSET and EAFINISH, which aren't defined in the file. But then there are two labels, :EALOAD (0x1e1) and :EADONE (0x1e3), that have no calls pointing to them. Is there an error in these labels and their use? Or is this a place where there's something else going on behind the scenes that drives microcode execution?

    2) Is the :INTR label on the right line? When I follow the sequence of instructions, it makes more sense to me to jump to 0x19e rather than 0x19d. (The write of tempC to AX makes no sense in the course of execution of an INT or INTO instruction).

    Again, thanks so much for your effort on this. It's a great resource for the emulation and simulation community.

    • Andrew says:

      I'm glad you like it!

      There are indeed some things going on behind the scenes to determine where certain jumps end up. EAOFFSET can end up at [i], EALOAD or EADONE (lines 0x1de, 0x1e1 and 0x1e3 respectively) depending on whether or not an offset is present and whether the instruction loads from the resulting EA or not. Similarly, EAFINISH can end up at EALOAD or EADONE depending on whether the instruction loads from the resulting EA or not.

      I think you're right about INTR being 0x19e on the 8088 - 0x19d is the right value for the 8086 but I didn't redo the translation table for 8088 and the code does make a lot more sense with 0x19e there. This could also account for some unexpected timings in my cycle-exact emulation experiments, so thank you for pointing it out!

      I have some other improvements to make based on feedback other people have given me so I will recheck that translation table and make an update soon.

      I'd be interested to hear more about your project!

  4. […] Intel 8086 and it’s cousin the 8088 launched the PC revolution in the early 1980s. The Reenigne blog has posted work on decoding teh 8086 microcode – the encoded instructions within the chip itself that make […]

  5. rasz_pl says:

    Have you investigated second source/clone 8088/8086 chips for the existence of REP MOV artifact, chips like NEC V20 or К1810ВМ86?

    • Andrew says:

      I haven't - I don't have any machines with those chips to run programs on, nor die photographs to disassemble. Given all the differences in the V20, it would be surprising if REP MUL worked the same way there. I have no idea how similar К1810ВМ86 is to the Intel 8088/8086.

  6. Nick says:

    Thanks for making this information public. I'm having trouble understanding the contents of translation.txt, and also what each of the microinstructions included in translation.txt do. I'm also a bit confused about the UNC jump destinations, e.g. UNC 5, or L8 2. Are these offsets from the start of the instruction?

    I also don't completely follow how an instruction like MOV reg, mem (10001001) works. The microcode moves R -> tmpb, where R is the register specified by modrm, and writes back to EA (i'm not sure which EA is understood here), and then moves tmpb -> M. How does this constitute a move from memory to reg? Or do you have to change something depending on the direction bit?

    I unfortunately cannot get this information from the patent or other documents like the 8086 iAPX manual. I'd appreciate it if you have answers to my questions or if you can point me to a source of information.

    • Andrew says:

      The translation.txt file contains the decoded contents of the translation ROM, which takes subroutine numbers such as those that appear in a "5 UNC" instruction and translates them into positions in the microcode program. Those substitutions have already been made in the disassembly listing, so you see "5 UNC NEARCALL" instead of "5 UNC 1" for example.

      "0 L8 2" is a short jump, conditional on the "L8" (immediate value is 1 byte) condition. As it's a short jump, the destination (2) isn't looked up in the translation table but the destination is formed by swapping out the low bits of the microcode position. So the "0 L8 2" on line 00c jumps to local address .0010 if the condition is true. which takes us to 00e (skipping line 00d which fetches a second immediate byte from the queue and places it into tmpbH). So yes, offset from the start of the instruction (or subroutine) is a good way to think about it (although not all subroutines start at a .0000).

      0100010?? is quite a complicated one, with lots of aspects controlled by random logic instead of microcode. The R (source operand) on line 000 is either the register specified by the r field, or the location specified by the rm field, depending on bit 1 of the instruction. Line 000 moves this into tmpb, and line 001 moves tmpb into M which is whichever register/location R wasn't. If it's a "MOV mem, reg" then line 002 is also executed which starts the bus operation to perform the memory write.

      So some of these instructions you kind of have to read keeping in mind what the instruction actually does, so you can see which parts are performed by the microcode and which aren't.

  7. Dan Tang says:

    This is very awesome work. I am planning to develop an 8086/8088 emulator based on this reverse engineering work. Information about microcode ROMs (stage1 and stage2) is enough. Is it convenient to provide the bit content of Translation ROM and Group ROM. Especially the definition of each field in the Translation.txt file?

    • Andrew says:

      Thanks!

      I didn't transcribe the group ROM. The bit content of the translation ROM is the first 4 columns in translation.txt. The second and third columns of this file are the "address" in the microcode ROM to jump to for that line (as a 9 bit "opcode" and a 4 bit "sub-opcode"). The last 3 columns show what that address actually corresponds to (matching decode address, line number, meaning). The first column is the decoder for selecting which line of the translation ROM to use:

      Bit 0 corresponds to the "5" or "7" instruction type (0 for a jump, 1 for a call).
      Bits 1-4 correspond to the subroutine number (0-9).
      Bit 5 is only used for choosing between EALOAD and EADONE (so must come from logic which determines whether this EA instruction is a load or a store/LEA).
      Bit 6 is 1 for EA byte decoding and 0 otherwise. In the EA byte decoding case, In the EA case, bits 0-4 come from the mod and R/M fields of the EA byte.
      Bit 7 is used to choose whether to load additional offset bytes or not, so must come from logic like "mod == 0".

      The 4th column is 0 for [SI]. [DI], [BX] and [iw] EAs and 1 for [BP] EAs so must be how the CPU decides whether to default to the data or stack segment if there is no segment override.

    • Andrew says:

      I'd be really interested to see your emulator once you finish it - there have been many times when I'd have found a cycle exact 8088 emulator useful but my own efforts in this area remain unfinished (although accurate in most cases).

      • Dan Tang says:

        Thank you very much!

      • Dan Tang says:

        I plan to use verilog to implement this emulator, which can better reflect the actual state of the hardware, and is considering whether to verify it on real hardware. For the Stage1 ROM, there is a question. In this link (https://forum.vcfed.org/index.php?threads/8088-8086-microcode-disassembly.77933/) it is mentioned that "The incoming opcode is compared to each of these (simultaneously) and the correct starting position is selected". That is to say, each 8-bit opcode will be compared with the 128 data in the Stage1 ROM to obtain the coordinates of the opcode in the Stage2 ROM. This solution cannot be implemented in hardware. Because one clock cycle, the data of the ROM can only be read once. To obtain the coordinates of the Stage2 ROM, 128 comparisons are required, which requires 128 clock cycles, which is unrealistic. Normally, the 8-bit opcode should be used to directly index the Stage1 ROM to obtain the coordinates of the Stage2 ROM without comparing. I don't know if you have adjusted the order of the data in the Stage1 ROM? At least now it seems that there is no way to do direct indexing.

        • Andrew says:

          I've not really done a lot in Verilog, but I understand it's very good at doing things in parallel. Can't you make 128 copies of the incoming opcode and compare them to all 128 selectors in parallel? That's essentially what the original 8088 and 8086 CPUs do. I haven't adjusted the order of the data. And yes, direct indexing doesn't work because some instructions are more complicated than others: lines 008-00b cover 32 opcodes while opcodes 0xFE-0xFF are covered by lines 020-022, 024-027, 04c-053, 068-06a, 074-076, 098-09b, 0d8-0df and 150-16f (not including subroutines).

        • phire says:

          Just implement the Stage1 ROM as an autogenerated 128 entry casez statement with wildcards.

          Despite how bloated such code looks, it should compile down pretty small on a FPGA. If I've done my math right, it should compile to just 28 4-LUTs

  8. Gianluca says:

    Hi!
    First of all, thank you for your awesome work, it is being very useful for my project, other than interesting by itself.

    In short, my project is to design and build an entire, fully functioning computer around the 8086, including motherboard, VGA card, I/O management, OS/software, etc.
    I already did some experiments with my 8086, and the disassembled microcode finally answered one of my questions:

    Q: Why, once started, the 8086 takes 6 clock cycles before fetching the first instruction?
    A: Because, assuming that the motherboard design is correct, the first thing the 8086 does is running the RESET routine, which takes exactly 6 clock cycles.

    I am currently abroad, but I should be back home in about a month.
    Meanwhile, I started developing an 8086 emulator trying to be as accurate as possible, both to help me with the design of the motherboard and to do something while I'm abroad. I think I understood most of the microcode in microcode_8086.txt, but I still have some questions.

    - By studying the microcode along with an example of execution in the 8086 family user manual (Figure 4-22, page 4-38), it seems that the difference between W and w is that, as David pointed out in the first comment, W includes an hidden RNI while w does not.
    This also means that the "x" in the "bus operation" instruction type 110dixssuu (file "key.txt") may be the RNI flag. Taking a closer look at the microcode and trying to decipher the letters A...U, it also seems that the bit "x" is only used in write operations, meaning that no other bus operation terminates the instruction (which kind of makes sense).
    What do you think about it?

    - Something I am struggling to understand is the following: does F1 refer to bit 1 (or its negation) in the FLAGS register or is it something else?
    It would (somehow) match the fact that, during RESET, FLAGS is initialized with zeroes, but an instruction like LAHF would load into AH something with bit 1 set to 1. But I could also be completely wrong and F1 refers to some other hidden register.
    If my idea is correct, what would you expect from something like REP LAHF (or REP PUSHF)? Or, even further, SAHF MUL when AH & 0x02 == 0x00?
    Otherwise, how is FLAGS mapped? In other words, why is bit 1 set to 1 if the RESET routine initializes FLAGS to be all zeroes?
    When I'll be back home I could do some other experiment as well, and maybe I could also test the "REP MUL" feature on both my chips (yes, I have two of them, and one of them looks different/newer).

    - Finally, I am assuming that the "group decode ROM" is the one responsible for running the random logic and/or loading the correct routine on the microcode.
    Namely, in my mind, what happens is that on FC (the "First Clock" in the 8086 patent) the first byte of the instruction is loaded into the "group decode ROM" and on SC (the "Second Clock" in the 8086 patent) either the random logic is executed or some microcode routine is run.
    Is this correct (or, at least, a good approximation)?

    I may have other questions as well (actually, it is some speculation about the location mapping given by [ABCDE] and [FGHIJ] in the microcode), but they are not important for the emulator (at least for now).

    • Andrew says:

      Yes, I think you're right about "x" in the "bus operation" instruction type meaning RNI. I'll update the zip file with this addition.

      It's a nice idea that F1 (and the internal flags for the segment override prefixes) would be kept in the unused bits of the FLAGS register, but it seems to not be the case - I just tried all the combinations with PUSHF and LAHF and there was no effect. I'm guessing the unused bits of FLAGS are just hard-wired, and that there is no flip-flop there to be set one way or the other.

      I haven't tried it, but I don't think SAHF would change the output of MUL because that combination is actually a valid pair of instructions so it would be a CPU bug if it did! Also SAHF isn't a prefix instruction so the F1 flags are cleared at the end of it.

      The group decode ROM sets 23 (I think) internal flags based on which instruction is being executed. I have now mostly understood it. It doesn't determine where in the microcode execution starts (that is the job of the stage 1 decoder).

      According to the patent, FC is the cycle when the first opcode byte is removed from the prefetch queue and SC is the cycle afterwards (when the EA byte is removed if there is one).

      I'd be interested to hear your other questions/speculation.

  9. Sabre says:

    Thanks for your work. This helps me a lot to understand the instruction clocks of 8086.

    But there's one thing that confuses me. The PUSH REG, PUSH SEG, PUSHF are all doing similar things and similar in microcodes. In "MCS-86 Assembly Language Reference Guide" (Oct. 1978, 9800749-1), these three instructions all cost 10 clocks, which seems reasonable to me. But later in "The 8086 Family User's Manual" (Oct. 1979 9800722-03), the PUSH REG costs 11 clocks while the other two remain 10 clocks. How can this happen? In my opinion, access REG as operand won't cost more clocks than SEG.

  10. […] 8086’s microcode was disassembled by Andrew Jenner (link) from my die photos, so we can see exactly what micro-instructions the 8086 is running for each […]

  11. 黄禄轩 says:

    I guess the unknown micro-code in WAIT instruction may related to disable interruption temporarily.
    https://twitter.com/HuangLuxuanCNSN/status/1618277628611563520

    • Andrew says:

      Ah, that makes perfect sense - thank you! I guess the same signal is triggered by the other prefixes: segment override, REP/REPNE, and LOCK.

      • 黄禄轩 says:

        yes, one of the group decoder output, triggered by prefixes, goes into a four-input or gate, along with this signal and the decoder output of last two slots of group decoder.

      • 黄禄轩 says:

        by the way, what would you like to name the signal

        • Andrew says:

          Maybe "suspend interrupts". Though I'm now curious if this (and the "POP sr"/"MOV sr" instructions) act like prefixes in other ways, not clearing the segment override, REP/REPNE and LOCK flags. If it does I might be inclined to name the signal "prefix".

          • 黄禄轩 says:

            I guess no. A signal from group decoder output not only goes to the four-input or gate to block a signal which is tested in uC.op=1001110yyy (INT) from letting CPU response interrupt instead of load next instruction, but also goes to another dual-input not-and gate. Normally, when IR and other register loads content, the gate outputs a positive pulse, but stay low if the signal is high. The signal is Q[11] counting from 0 from bottom of the group decoder to the top in ken shirriff's image. In my project IR.DI[8] will be loaded to IR[8] if IR_W is high. IR.DI[8]=INT_PENDING&&!SUSP_INT. SUSP_INT=(IR==0x08E||IR&0x1E7==0x007)||(GROUP.Q[11]||uC.SUSP_INT). uC.SUSP_INT is a flip-flop, set to 1 if uC.OP==1001000yyy or to 0 if IR_W!=0 at trailing edge of phi2, and set to 0 asynchronously if RST. and the flip-flop output doesn't go to any where except the four-input or gate

          • 黄禄轩 says:

            First thing, confirmed the dual-input not-and gate is used to reset prefix flags. it will reset F1.
            Second thing, there are two F1. One of them is tested by NF1 and F1, and another is tested by F1ZZ.
            F1 is set when (IR&0x1FE)==REPNZ&&IR_W, and reset when the dual-input not-and gate outputs high, and flip when uC.OP==CF1. and uC.JF1, uC.JNF1 will test it.
            the other F1 is set to IR[0] when (IR&0x1FE)==REPNZ&&IR_W||uC.OP==CF1 and will hold its content in other condition no matter what happend. the flag XOR ZF is tested when uC.JF1ZZ.
            And i'm curious about why you name F1 F1? And what would you like to name the other F1?

            • Andrew says:

              The name "F1" for the REP flag comes from the patent describing the internals of the 8086 (US4363091). Maybe the second one should be called F1Z because it's XORed with Z (ZF) to form F1ZZ.

  12. 黄禄轩 says:

    Another thing I noticed is that NWB WB and NX in alu operation will not affect RNI skipping. It's likely that RNI will always be skipped if mmm refers to memory. the only difference between NWB, WB and NX in alu operation is that NWB will always cause loader to prepare for next instruction, but the other two issue NXT only when some conditions are met. NXT=NWB||((WB||ALU_NX)&&(cond1||cond2||cond3))

    • Andrew says:

      Yes, that's what I've noticed too, and my emulator code reflects this: https://github.com/reenigne/reenigne/blob/6ee16df0d974ff41e8591294b9349701fc01233d/8088/xtce/xtce_microcode.h#L2764 . I also changed "NWB,NX" to just "NX" in the most recent update of the microcode disassembly earlier this month, since the behaviour of this operation seems to be exactly the same as the NX bit in an ALU-priming microinstruction.

      • 黄禄轩 says:

        No, as what i see in the chip, WB,NX has the same behavior as ALU NX. so i think it's better that get rid of WB,NX because it seems that ALL RNI is skipped if destination is mmm and mmm refers to memory. the skip logic is RNI=uc.op.RNI&&(uc.dst!=10010(MMM)||cond2||cond1). cond2 and cond1 in RNI logic expression, and cond2 and cond 1in NXT logic expression, is the same signal. I can give you my project so you can get what i mean if you like.

        • Andrew says:

          Oh, I see what you mean now. Yes, the implementation of ALU NX ("if (_mIsM && _useMemory && _alu != 7 && (_group & groupEffectiveAddress) != 0) _nx = false;" in my emulator) is much more similar to the implementation of WB,NX ("if (!_mIsM || !_useMemory || _alu == 7) _nx = true;") than it is to NWB,NX ("_nx = true;"). It might be possible to merge the first two - I haven't checked. I'd be very interested to see your project, though!

  13. 黄禄轩 says:

    A new discovery: there is another ZF hidden in chip, and I call it ΣZF or SZF, and it is tested by NZ. and Z tests ZF. detailed logic of ZF and SZF: https://twitter.com/HuangLuxuanCNSN/status/1624386607389503489

  14. […] 8086's microcode was disassembled by Andrew Jenner (link) from my die photos, so the microcode listings are based on his […]

  15. […] microcode listings are based on Andrew Jenner's disassembly. I have made some modifications to (hopefully) make it easier to […]

  16. Ryan says:

    Hi, I know this is quite a bit later, but I'm writing an emulator for the 8088 that interprets the microcode itself. However, I seem to have hit a wall for the shift rm8,cl instructions. How does the CPU determine where in the microcode to jump for these instructions. I vaguely understand the bottom three bits of the address are set to modrm.reg, but that would jump to the middle of a microcode routine, wouldn't it?

    Thanks, Ryan

    • Andrew says:

      These instructions are at lines 08c-093 of the microcode listing. The instruction bits for line 08c are 01101001?.00 corresponding to opcodes 0xd4 and 0xd5, with the counting register going from 0 to 7. So the microcode ROM is essentially comparing the 9-bit opcode and top two bits of the counting register to all 128 sets of instruction bits at once, and running the microcode corresponding to the activated set. Ken Shirriff's blog posts go into much more detail about how the microcode works. You can also take a look at my attempt at writing an 8088 emulator that interprets the microcode at https://github.com/reenigne/reenigne/blob/master/8088/xtce/xtce_microcode.h (not finished yet as there may be some issues with the trap flag, undefined flasgs, and timing of interrupts between instructions).

  17. […] microcode listings are based on Andrew Jenner's disassembly. I have made some modifications to (hopefully) make it easier to […]

  18. […] My microcode analysis is based on Andrew Jenner's 8086 microcode disassembly. ↩ […]

  19. […] My microcode analysis is based on Andrew Jenner's 8086 microcode disassembly. ↩ […]

  20. […] The REP prefix is used with string operations to cause the operation to be repeated across a block of memory. However, if you use this prefix with an IMUL or IDIV instruction, it has the unexpected behavior of negating the product or the quotient (source). […]

Leave a Reply