1K colours on CGA: How it's done

[Update: VileR's writeup of the 1K colour mode is now up. His has fewer technical details but is much easier to understand than mine as it has pictures!]

When displaying graphics on an original IBM Color Graphics Adapter (CGA), normally only 4 colours (from a palette of 16) are possible at once. A few games written for such systems took advantage of the artifacting on the card's NTSC composite output to get 16 colours at once. On Saturday, a team of people including myself, Trixter, Scali and VileR released a demo ("8088 MPH") which smashed this limit and won first place in the "Oldskool Demo" compo at the Revision 2015 demoparty in Saarbrücken, Germany. Some commenters have suggested that the production is a fake and that what we claimed to have done is impossible. Others have suggested it's dithered or flickered to get more colours. But it is none of these things. Here is how we did it.

First of all, what defines a colour on the composite output? There's only one signal line on a composite connection (plus a ground return path) so you can't have separate red, green and blue analog levels like you have on a VGA card (or separate red, green, blue and intensity lines like you have on an RGBI connection.) Instead, a composite signal effectively sequences the red, green and blue signals in time. A composite colour is a signal which repeats at a frequency of 3.57MHz (half of the width of a text character in 80-column mode). Given such a signal, you can compute its DC component (average voltage), oscillation amplitude (about this average) and phase (relative to the color burst pulse at the start of each scanline). These three parameters directly correspond to brightness (luminance), saturation and hue respectively. Higher frequencies (2nd and greater multiples of 3.57MHz) are not involved in colour decoding and would normally have been filtered out by the decoding circuitry in the composite monitors CGA cards would have been connected to in 1981.

The most common CGA composite mode works by putting the card in 1 bit-per-pixel (1bpp) mode - i.e. each pixel is either off (black) or on (white, generally, though this could be changed via the palette register). A single period of color carrier oscillation contains 4 pixels in this mode (the pixel rate is 14.318MHz), so there are 16 possible waveforms you can make with patterns of lit and unlit pixels and hence 16 artifact colours.

Separately to artifact colour, the CGA card has 16 "direct" colours (the ones that are available in text modes). These are just the 16 possible RGBI bit patterns on an RGBI output, but how does the card generate these colours on the composite output? It does so by generating 3.57MHz waveforms on the card at 6 different phases using flip-flops. These are the colours blue, green, cyan, red, magneta and yellow. Including the constant digital signals 0 and 1 (GND and +5V) gives 8 basic colours. To get the intense versions of these, an additional DC offset is applied when the digital signal is turned into an analogue one at the output.

The 1K colour trick hinges on noticing that the direct colours are not the same as the artifact colours. In 1bpp mode you can change the palette register to get different sets of artifact colours. Suppose you change the palette register to blue - then any black pixel will "turn off" the corresponding part of the "blue" waveform. These "chopped up" colours are different yet again from the 16 direct colours and the 16 normal artifact colours. So you could get 256 colours that way (though you can't put them wherever you like because there are limits to how often you can change the value in the palette register).

Suppose that in 1bpp mode you had a second palette register so that you could change the colour corresponding to the 0 bit as well as the one corresponding to the 1 bit. Then using the same techniques you could generate 2K colours (16 foreground colours, 16 background colours and 16 bit patterns for choosing which colour goes where - but swapping foreground and background and inverting the bit pattern yields the same colour). Here we come to the crucial part of the trick: in text mode you can kind of do that - the attribute byte for a character (when flashing is disabled) lets you choose the foreground and background colours independently. Unfortunately you don't get to choose the bit patterns you want - those are defined by the bits in the CGA's character ROM, which can't be changed from software.

VileR is the one who deserves credit for the next observation. He pointed out to me that the characters 'U' (capital letter U, 0x55) and '‼' (double exclamation mark, 0x13) both have bit patterns in their top two rows 11001100 and 01100110 respectively) which are the same for the left nybble as for the right nybble, and the same in both rows. Therefore, if we change the number of scanlines per character row to 2 (as is done in a number of other CGA games to get a 160x100x2 mode using a "vertical half solid" character - 0xDD or 0xDE) we should be able to get ~500 colours (2 useful characters times 16 foreground colours times 16 background colours) at a resolution of 80x100.

In order to get from there to 1024 colours we need to find some more characters with the same properties as 0x55 and 0x13. It would be fantastic if there happened to be for every nybble value X a character with that bit pattern in its top 4 nybbles, but unfortunately only the nybble patterns 1100 and 0110 are obtainable that way. However, if we consider just the top scanline instead of the top two, we find two more characters with the right property - '░' (light shade, 0xb0, bit pattern 00100010) and '▒ ' (medium shade, 0xb1, bit pattern 01010101). Unfortunately the second scanlines of these characters don't play ball, and if we tried to use them with 2 scanlines per row we'd get horizontal stripes instead of solid-coloured pixels.

So to get those extra colours we need to use 1 scanline per row. However, there's are several complications in doing so. One is that the CRTC on the CGA card (Motorola MC6845) cannot generate more than 128 rows (plus up to 32 extra scanlines) per frame and we need to generate 262 scanlines per frame in order to maintain the correct ratio of hsync pulses to vsync pulses that the monitor requires to generate a stable picture.

It is possible to do this, though, by generating multiple CRTC frames per CRT frame (and suppressing the vsync pulse for all but one of them). This is how we generated the wide picture before the credits part (the one with our faces) - in that image there's a 100 scanline frame with 1 scanline per row and immediately below it a 162 scanline frame with 2 scanlines per row.

But there were several 1K colour images in the demo that filled the entire screen - how did we do those? The answer is very similar but instead of having one frame 100 scanlines high, we have 100 frames that are 2 scanlines high (all with 1 scanline per row). In the middle of each of these frames the memory address is advanced by one row by the CRTC. In each frame we advance the CRTC start address register by one row's worth of characters, so that the top row of one frame is the same as the bottom row of the frame above it. So each frame straddles two pixel rows and each 2-scanline-high "pixel" straddles two CRTC frames.

So we're done, right? That's all there is to the trick? Well, not quite - there are more complications. If you do the obvious thing and set 80-column text mode, colour burst enabled via the BIOS, you will see either no colours at all on your composite display or colours that flash in and out and change hue (on monitors that don't have a properly functioning colour-killer circuit). The reason for this is that the CGA card was never designed to be used in 80-column text mode with composite colour display (the text doesn't have enough horizontal resolution to be readable) and there's a hardware bug that prevents it from working properly anyway.

The bug is that the CGA card takes the horizontal sync (hsync) signal from the CRTC (which just goes high and low once per scanline) and uses it to trigger a more complicated composite pulse signal consisting of front porch, sync, breezeway, color burst and back porch. The whole process takes 10 character periods in modes other than 80-column text (-HRES modes) so the BIOS programs the CGA's hsync width register to 10. But in +HRES (80-column text) mode these 10 characters are half the width, so the hsync process gets interrupted half way through leading to a truncated sync pulse and no burst at all.

This is well-known and the usual way of dealing with the problem is to set the border colour (palette register) to 6 (dark yellow - not brown as it is on the 5153 RGBI TTL monitor) so that the monitor picks up its color burst from the border instead. However, on our hardware we found that doing this made +HRES modes significantly darker than -HRES modes. This is because monitors and capture devices calibrate their gain to normalize the amplitude of the burst pulse, and colour 6 is brighter than the normal burst pulse. Not all of the demo uses +HRES mode and we found that we could not use a single set of calibration settings for both -HRES and +HRES parts - if we tried then either the +HRES parts were too dark or the -HRES parts were washed out, leaving colours 9-15 barely distinguishable shades of white. We didn't really want to have to edit our capture to brighten up just some parts of the demo. Another problem was that both of the capture devices we had brought with us to the party were giving a shimmery picture (unstable horizontal sync) with this fix.

Instead what we ended up doing is leaving the border colour as black but increasing the horizontal sync width to its maximum value of 16 characters (programmed as zero, which looks wrong, but it's a 4-bit register and the compare is done after the increment - at least on the MC6845 CRTCs on the CGA cards we were using). This gives a burst of either half or three-quarters the standard width (depending on whether the character it starts on corresponds to a rising or falling edge of the CGA's internal +LCLK signal that is used to time the hsync sequence. I think we managed to arrange it so that it's always three-quarters but there may be bugs in that part of the code.

That fixes the brightness problem but unfortunately some capture devices (including the one that Trixter used to do some test/failsafe captures before the party) cope less well with this than with the border colour 6 change. If we release a "final version" (with a few minor improvements and bug fixes) we might include a "calibration screen" that people can use to choose the border colour, hsync width and phase that works best with their output device.

Yet another complication is that there were multiple revisions of the IBM CGA card. They had (to a good approximation) the same standard composite artifact colours but different direct colours. On the older CGA cards, colours 1-6 were all the same brightness, as were colours 9-14. This made them indistinguishable on monochrome composite monitors, so for the second revision of the CGA card, IBM added some more resistors to the output DAC in order to make different colours different brightnesses. They also removed the -BLANK signal so that the burst pulse is the same amplitude no matter whether it comes from border colour 6 or from the hsync burst (the truncated burst problem is still present, though).

Different direct colours mean that our 1K colour mode displays a different set of colours on old CGA cards as on new CGA cards. We debated a bit about whether we should target old or new CGA cards for our demo, but in the end we decided to go for old CGA cards, mainly because the set of colours you can get from an old CGA card are more useful (artistically speaking) than those from a new CGA card.

In order to make the hand-drawn 1K colour pictures, VileR and I made some test captures of the old CGA card's output with all the useful combinations of attribute and character, which he then used as a palette to paint his pictures. Happily he was able to find in there some close correspondances to the 16 RGBI colours and the 16 colours of the Commodore 64's palette.

For the pictures that were converted from photographs, I wanted to be able to use more characters than just 0x13, 0x55, 0xB0 and 0xB1 - I wanted to be able to try all different characters (even those that have different left and right nybbles in their top scanline) to get a closer match to the source image. However, getting calibration images for all 65536 combinations (let alone the 4 billion artifacts that can be generated from adjacent characters) was impractical. To make that work, I really needed to have a mathematical model of the CGA's composite output stage that I could use to generate the right colours. Ideally I would be able to generalize this to new CGA as well.

My first attempt at this was the one I used for the Hydra image - I assumed that the direct colours had hue/phase angles that were multiples of exactly 45 degrees, and that the CGA's pixel colour multiplexer chip was able to switch instantaneously between them. However, the hydra didn't come out looking how I expected on real hardware. Much later, I learnt that the main reason for this is that the TTL logic chips used on the CGA card don't switch instantaneously - there are logic delays between a signal coming in to an input pin and the corresponding change happening on the output pin. When your color carrier period is 279ns, a delay of just 7ns causes a noticable phase shift of 9 degrees.

There are several logic chips on the various signal paths of interest here, all with their own logic delays. My second attempt at modelling the CGA involved looking up the data sheets for all these chips, finding typical values for the logic delays (most of them were listed as a range) and generating an accurate model that way. This worked excellently for 1bpp mode, reasonably well for 2bpp mode, and not so well at all for +HRES mode. This is the implementation that is in the current SVN versions of DOSBox. I kept adding more and more parameters to my model and attempted to tune them to match my captured calibration images but I could not get good results that way. The trouble seemed to be in the guts of the multiplexer chip itself - the output signal depends in a complicated and mysterious way on all of the input signals, so the number of parameters required to describe its behavior quickly becomes impractical.

The final breakthrough came when I realized that I didn't need to model the composite signal *exactly* - I just needed to model it well enough to describe the observed colours. All the relevant colour information is at frequencies below 7.16MHz. By the sampling theorem, if we can reconstruct a version of the signal sampled at 14.318MHz, it'll be exactly correct not including frequencies at or above 7.16MHz (which we don't care about). The key insight is that we don't care about what happens to the signal *in between* those samples - it can bounce around, transition as slow or fast as you like as long as we know where it ends up when we measure the sample - all that extra freedom just manifests in the frequencies we don't care about.

The multiplexer takes a while to transition from one colour to another - on the order of 70ns (one 1bpp pixel time). So there isn't a place in the signal that we can sample and be sure that the previous transition has stopped and the next transition has not yet started. I theorized that at any given time there will not be more than one transition taking place. So a transition (and hence a sample) can be completely described by 1024 parameters - one for each combination of left colour (16 possibilities), right colour (16 possibilities) and phase within the color carrier cycle (4 possibilties).

I made a test pattern which does a very good try at getting swatches of all 4096 foreground/background/pattern combinations in just a couple of screenfuls (some can only be obtained for a short stretch, as transitions). This was quite a feat in itself - I needed an area of screen consisting of scanline 3 of several characters repeating vertically, necessitating having four CRTC scanlines within a single CRT scanline - a hairier CRTC manipulation trick than any that we actually used in 8088 MPH itself!)

I set up a model with these parameters and tried to match it to my captures. I initially tried to use a gradient-ascent hill-climbing algorithm to search the parameter space but before I could get it to work I realized that most of the parameters affected very few of the test swatches - any transition between two different colours X and Y can only affect colours with X and Y as foreground and background colours (32 of the 4096). If the left colour and the right colour are the same then any swatch with that colour as either foreground or background can be affected (496 of the 4096). That observation made a more naive hillclimbing algorithm much more practical, and it just takes a few minutes to find a set of 1024 parameters that match the measured values to within a few percent.

I wanted to model the new CGA as well as the old CGA, but didn't have an easy way to get good captures for that. However, the difficult bit of the old CGA to model (the multiplexer) is identical in the new CGA. So instead of applying the technique described above to the final output, I applied it to the multiplexer output and the intensity bit output separately. This yields a 256-element table and a 16-element table respectively. The outputs from these are summed to get the final output. There was a small amount of degradation from the 1024-element table version but it's too small to notice directly. To generate new CGA output, I just duplicated the intensity bit logic and applied it to the R, G and B bits (with appropriate scaling based on the resistor values in IBM's schematics). I haven't yet tested if this really matches new CGA output, but I don't currently know of any reason why it wouldn't.

This CGA simulation algorithm is implemented in a program I made called CGA2NTSC which has two main functions. One is to act as a CGA composite output stage emulator and NTSC decoder (taking a picture such as one might find on an RGBI monitor and show what it would look like on the composite output (old and new CGA). The other is to take a (24-bit colour) input image and try to find a set of data which, when loaded into the CGA's video memory, will best reproduce that image on the composite output. This is what we used to make the faces picture in the demo. It supports 1bpp and 2bpp modes as well as both text modes (though we only used it in +HRES mode for 8088 MPH). The program uses error diffusion (which can be turned down or off). I've had a couple of requests to make it use ordered dithering instead. That's possible for 1bpp and 2bpp modes but doesn't really make sense for text modes where you don't get an arbitrary choice of bit patterns. The program is a bit unpolished but should be reasonably usable.

Next time: how I played a 4 channel MOD at a sample rate of 16.6kHz through the PC speaker on a 4.77MHz 8088 CPU.

38 Responses to “1K colours on CGA: How it's done”

  1. Pup says:

    This is a really nice write-up, thank you! Looking forward to the MOD-playback entry :-).

  2. […] 1K colours on CGA: H… on 8088 MPH: We Break All Your… […]

  3. don bright says:

    absolute 100% amazing. standing ovation.

  4. stavs says:

    Congratulations. You richly deserve that 1st place. Thanks for taking the time to explain the techniques used.

  5. […] I will discuss some of it in more detail at a later time. Trixter has already done a global write-up of the demo. And reenigne has done a piece on the 1024-colour tweak. […]

  6. […] 1K colours on CGA: How it’s done « Reenigne blog – […]

  7. vesabios says:

    This is so hardcore/badass. This is the most insane thing I've ever seen. I grew up with cyan, magenta, and white and can't believe what you guys were able to hack. Wow.

  8. Richard Kirk says:

    Can such things be? I remember wrestling with a CGA card, and hardly crediting how broken it was. You have passed clean through the realm "having too much time on your hands" and come out the other side into this unbelievably strange land where a CGA works like a C64. And I can understand how it was done, too. Amazing.

  9. llogiq says:

    I remember having a 22k mod player with beeper output on my 386sx25, so 16k output on a 4'77mhz CPU doesn't sound too outlandish if you can get the DMA to write the combined sound to the speaker port.

    • Andrew says:

      DMA would make it much easier (in fact then the Galaxy Player code could be used). The PC speaker/PIT isn't connected to DMA, though.

    • Scali says:

      Erm wait... 22 KHz on a 386sx25 means 16 Khz on a 4.77 8088 is not outlandish?
      You do realize that the 22:25 is a ratio close to 1, where 16:4.77 is a ratio closer to 4.
      Not to mention that a 386sx has a 16-bit bus and is as far far faster and more advanced CPU than the 8088 is.
      Even a 386sx at 4.77 MHz would easily be 4 times as fast as an 8088.

  10. cTrix says:

    Congrats on the 1st place. Very much deserved! I used to stare at scopes while lining up SP machines (back when they were still a thing) so I hugely appreciate your insight into the actual signal breakdown. Especially dealing with the black / colour burst. Looking forward to attempting to play this demo at the next Flashback demoparty (!).

  11. 0wing says:

    Can we actually do games with such colorful gfx in CGA?

    • Andrew says:

      Yes! In fact, I have some ideas for a CGA game engine that I'd like to try out very soon.

      • 0wing says:

        Wow, HOW? If CPU is busy with rendering and outputting gfx and sfx stuff already?

        Would be awesome to see action-type games with neat colors in CGA.

        • Andrew says:

          I think that for a game, instead of the full 1K colour mode I would rather use a variant of the ~500 colour mode that VileR originally discovered, as that doesn't require any CPU usage to just display a static screen. Doing fast, 4-way, full-screen scrolling with moving sprites while avoiding CGA snow and doing 2 channels of sound in a way that doesn't depend on a particular CPU frequency is tricky enough as it is!

          • 0wing says:

            4-way scrolling for most action games is overkill, so 2-way is enough. 500 colors too, I think.

            • Andrew says:

              This game won't be like most action games. For a start, most action games require more than a CGA. Also, 4-way scrolling isn't that much more difficult than 2-way scrolling on this platform, but is definitely more impressive. And overkill is better than underkill :-)

              • 0wing says:

                Underkill is not using sceners' achievements and still be with default CGA palette. Most games were like this, but they're developed in dark 80s...

          • VileR says:

            An amusing 'artistic' challenge I've had in mind - how difficult would it be to design graphics for a game targeting two different CGA options:

            - 512-color "ultra low-res" for composite monitors (pseudo-80x100)

            - Macrocom-style "ANSI from hell" for RGB monitors (pseudo-640x200)

            Both use the same mode technically (100-line 80c text mode), but in terms of the level of detail, they're the two opposite extremes we can have on CGA.

            • Andrew says:

              I was hoping that you might be up for such a challenge! It would be good if the game I'm planning had the ability to run on RGBI monitors as well as old-style composite (and maybe new-style composite as well). Programming-wise it would be no problem at all to be able to switch between two different sets of graphics, but of course that means the artist has to draw all the graphics twice (or more). Another option might be to design 80x100 graphics with an arbitrary palette and then use a look up table to convert to either 0x55/0x13 combinations for composite, or to 0xb0/0xb1 combinations for RGBI. Wouldn't be ideal for either, but much less work for the artist.

              • VileR says:

                The real challenge would be the first option - coming up with two entirely different sets of graphics drawn separately. If we postulate that everything has to be the same size on screen in both cases (to keep both code and game mechanics identical), it should be interesting to try and make them both look good, when one has 32 times as many colors but roughly 1/16th of the detail!

                • Andrew says:

                  Yeah, on one hand they don't need to look the same at all (just as long as each set of graphics is consistent within itself) but on the other hand the collision masks should be the same, which means that the basic shapes should be the same. Movement and transparency masking is all at the character cell level (i.e. 80x100) so it's really just about the details (or lack thereof) within each sprite/tile.

  12. 0wing says:

    Now I'm even more confused, if early games with VGA support like Mean Streets can be converted to CGA with identical or no less vibrant palette.

    • Andrew says:

      CGA composite resolution is much lower than VGA's, though (especially with so many colours). So even though they might be as colourful, CGA images won't look as good as those designed for VGA. VGA machines were mostly 386 or so as well, so VGA games tended to have higher CPU requirements. Despite that, I'm sure there are some VGA games which could be done very well on 8088+CGA. However, for a maximally impressive game I think the best route is to design a game around the machine's strengths.

      • 0wing says:

        Mean Streets came in late 80s and had CGA support with horrible palette and horribly dithered gfx... Was 386 and VGA that common in late 80s-early 90s?

        • Andrew says:

          Common enough to make it worthwhile for games producers to create VGA graphics rather than just CGA graphics, but not so common as to make it not worthwhile for games producers to make CGA graphics at all (even if it was just dithering the VGA graphics to a CGA palette).

  13. […] que melhora. Aqui e aqui tem a explicação detalhada de uma invocação demoníaca em particular: como fazer 1024 […]

  14. […] Hacker News Daily 上看到的,1024 色的 CGA 畫面:「1K colours on CGA: How it's done」。比較容易懂的是「CGA in 1024 Colors - a New Mode: the Illustrated […]

  15. […] han sacado nada menos que 1024 colores y sonido polifónico. Para quien esté interesado, hay un post técnico muy detallado acerca de cómo se ha logrado tal hazaña. También hay otro explicado de forma gráfica. El uso de […]

  16. DLM64 says:

    Fascinating. I applaud your creativity, intelligence and ingenuity. I love seeing people take old hardware to places the designers never even imagined.

  17. […] Read more here at 8088mph; also you could begin instead with this post at Reenigne’s blog. […]

  18. AresUII says:

    Is it possible to use other characters to bump the number of colors to 4K and double the resolution? If not, why not?

    • Andrew says:

      The theoretical maximum number of colours is 2K, not 4K. Although there are 16 possibilities for each of foreground colour, background colour and bit pattern, there's a symmetry (swap foreground and background colours while inverting bit pattern) which halves the number of possible colours. Unfortunately none of the other characters in the CGA's character ROM have the right bit patterns in the top scanline to give the other 1K colours. Swapping out the ROM chip would give us more colours, but we wanted the demo to be able to run on stock machines.

      As for doubling the resolution, that would mean being able to independently choose the colours of the left half and the right half of a single character. There are 3 problems with this:
      1) The characters don't have the right bit patterns in the top scanline (again).
      2) You only get one foreground attribute nybble and one background attribute nybble per character, so there would be an "attribute clash" to account for.
      3) There is only 16kB of VRAM so even if the other two problems could be solved, there's not enough for more than 256 colours at a resolution of 160x100.

      However, it is possible to do a little better than 80x100x1K. By allowing a free choice of characters, you can improve resolution and number of colours at the expense of losing the ability to have a free choice of colours in each pixel of a regular grid. The "faces" effect just before the credits sequence in 8088MPH does this.

      Since 8088MPH debuted, I have not been idle - I am actively working on ways to get even better graphics of out the old CGA card and I have several new techniques in the pipeline that I'm hoping to eventually use in a sequel demo.

  19. […] en pantalla, siendo el primer programa en conseguirlo en CGA. Para quien esté interesado, hay un post técnico muy detallado acerca de cómo se ha logrado tal hazaña. También hay otro explicado de forma gráfica. Se puede […]

Leave a Reply