Archive for the ‘video’ Category

Adventures in CRTC lockstep

Monday, October 1st, 2012

Once I had achieved CGA lockstep, I tried some test programs. This image was made by cycling through the possible palette registers as quickly as possible (i.e. it's running a big unrolled loop of "INC AX; OUT DX,AL" to the palette register):

That worked great, except that in making it I noticed that the pattern wasn't always starting the same way - half the time the first visible scanline had different set of colours. Somehow a bit of state was leaking through my lockstep routine!

After a while I figured out that it was due to the way I was getting the CRTC into lockstep with the CGA and CPU. The smallest frame that the 6845 CRTC can do is two character clocks (1 character by 2 scanlines - a 1 scanline high frame doesn't work with that CRTC). I thought I could get around this by going into high-res mode - then 1 character clock is 1 hchar so a frame would be 1 lchar and we'd be in a known place in the frame once we were in a known place in the CGA cycle.

Have you spotted the problem yet? The problem is that I don't know what the phase relationship is between the CGA clock and the CRTC clock - the first hchar of the frame could be the left or the right hchar within the lchar! And in fact, which it turned out to be was decided at random on startup.

With a bit of fiddling I eventually came up with a way to get the CRTC into lockstep as well. The trick hinges on the fact that if we set up the CRTC parameters so that one of the scanlines is displaying a normal visible image and one is overscan, we can tell which scanline is which by reading the display enable bit of the CGA status register. Then we delay an odd number of lchars if the display enable bit is set one way and an even number of lchars if it's set the other way (it doesn't matter which is which). Because we want to keep the CGA and CPU in lockstep as well, the difference in the codepath lengths must also be a multiple of 3 lchars, so delaying for X lchars one way and X+3 the other works fine.

That's about all there is to it. The full lockstep routine is on github. Once lockstep is entered it'll persist until you wait for a time that depends on an external event (such as reading from disk/serial/parallel/ethernet/joystick or waiting for a keystroke). That doesn't mean that lockstep mode games and trackmos are impossible, though. The keyboard can be read by polling (pretty much all PC software directly or indirectly uses an interrupt for keyboard access but it isn't compulsary and I've done it by polling a few times). You just have to make sure the code paths are the same length no matter whether a key was pressed or not and no matter which key was pressed if there is one, which can be done by adding suitable delays. Disk access is a bit more difficult, since there's going to be a DMA bus access at some unpredictable point, and after it's happened you'll be out of lockstep. I think the solution is to HLT after the disk access is complete and restart execution on a timer interrupt. In the event that lockstep between CGA and PIT isn't possible, regaining lockstep once the timer interrupt has occurred should be possible by delaying for N ccycles for some N between 0 and 15 and a CGA memory access. Another possible way is to make sure the CPU is running code that is either:

  1. BIU-bound with no wait states, or
  2. that is EU-bound and never exhausts the prefetch queue

for the entire time that the accesses might be happening. That way the time taken to run the code doesn't depend on exactly when the accesses occur.

The CGA wait states

Saturday, September 29th, 2012

As part of my project to emulate an IBM PC or XT with cycle accuracy, I also wanted to emulate the CGA card with cycle accuracy. That meant figuring out exactly what the wait states are when accessing CGA memory. Here's what I found out.

When talking about this stuff it helps to have a common terminology to talk about the several units of timing involved. This is the terminology I use:

  • 1 hdot = ~70ns = 1/14.318MHz = 1 pixel time in 640-pixel mode
  • 1 ldot = 2 hdots = ~140ns = 1/7.159MHz = 1 pixel time in 320-pixel mode
  • 1 ccycle = 3 hdots = ~210ns = 1/4.77MHz = 1 CPU cycle
  • 1 cycle = 4 hdots = ~279ns = 1/3.58MHz = 1 NTSC color burst cycle
  • 1 hchar = 8 hdots = ~559ns = 1/1.79MHz = 1 character time in 80-column text mode
  • 1 lchar = 16 hdots = ~1.12us = 1/895KHz = 1 character time in 40-column text mode

The wait state algorithm for the original IBM CGA is basically "wait 1 hchar, then wait for the next lchar, then wait for the next ccycle". That works out at between 3 and 8 ccycles depending on the relative phase of the CPU and CGA clocks. There are actually 16 possible relative phases (one for each of the hdots within the lchar at which the CPU cycle starts).

One relative phase has a 3 ccycle wait state and there are 3 relative phases for each of the other 5 possible wait state lengths (4, 5, 6, 7 and 8 ccycles respectively). 1+3+3+3+3+3=16. So the average wait state is (3+4*3+5*3+6*3+7*3+8*3)/16 = 5.8125 ccycles, but you might measure a different average depending on how your piece of code ends up synchronizing with the CGA clock.

In a way it's rather unfortunate because with a slight hardware modification I think the 1 hchar wait state could have been eliminated, making the average wait state about 3 ccycles shorter and roughly doubling the average speed of the CGA memory access.

Also unfortunately, "rep stosw" gives almost the worst possible wait state behavior. I haven't tried it yet, but I suspect that it would be possible to write CGA code that self-synchronizes to get the best possible wait states (though of course that would probably only improve performance on machines that were cycle exact with the machine that it was tuned for).

A third unfortunate thing is that the wait states are the same whereever the raster is on the screen - they aren't disabled during the retrace interval or anything like that. There's a good reason for that though - the CRTC continues to strobe through the CGA RAM throughout the overscan/retrace areas for dynamic RAM refresh - allowing the CPU access to the full memory bandwidth could result in loss of video RAM data, since the CGA doesn't participate in the system DRAM refresh cycles (which is a good thing, because otherwise all those wait states would propagate to the entire memory system).

Scan doubler reverse engineered

Saturday, September 22nd, 2012

My XT came with an unusual and interesting ISA card - a PGS scan doubler II. From the name, connections and the chips on it I determined that it was supposed to convert CGA RGBI TTL signals from 15.7KHz horizontal line rate to 31.4KHz, which would make it (timing wise) compatible with standard VGA monitors (it'd still need a DAC to convert from the TTL digital signals to RGB analogue ones).

Soon after I got it, I tried to make it work with my CGA card, but couldn't get anything to display on my VGA monitor. I didn't have an oscilloscope then so there wasn't really much I could do in way of diagnosis (I do have one now, but I still haven't diagnosed the problem due to my XT being en route from Seattle to the UK). For debugging purposes (and just because I was really curious about how it works) I decided to reverse engineer the card to a schematic. Here is the resulting schematic.

Interestingly, it only uses half of it's 2Kb of RAM. There are four 1024x4 bit NMC2148HN-3 SRAM chips, but address line A9 of each chip is grounded, so only the first half of each chip is ever actually read to or written from. One might be inclined to wonder why they didn't use half the number of RAM chips. The answer is memory bandwidth: for each CGA pixel (i.e. at a rate of 14.318MHz) the card has to write a pixel to RAM and read back two. Each pixel is 4 bits, so that's an access rate of 229 megabits per second, which would be too fast for two such chips by a factor of two. So the solution is to increase the bandwidth by parallelization - it turns out that accessing 16 bits at each cycle is enough, but that means having four chips instead of two.

Most of the rest of the card is pretty straightforward - just sequencing the read and write operations in the right order to the different chips, detecting the hsync pulses for genlocking and parallelizing the input pixels. There is one bit which involves logic gates coupled by capacitors - this seems to be a clever hack to double the 14.318MHz clock to generate a 28.636MHz VGA pixel clock (I haven't simulated it because I can't quite read the capacitor values - I think I'll need to unsolder them to measure them). Technically such a clock doubling probably isn't necessary, since the left pixels could be emitted on the low half of the clock and the right pixels on the high (or possibly vice-versa) but maybe the logic delays cause the pixels to interfere, or maybe it was just easier this way.

What is the CGA aspect ratio exactly?

Monday, October 3rd, 2011

Somebody asked on the Vintage Computer Forums about what the CGA aspect ratio is supposed to be. The answer is usually given as 4:3 (pixel aspect ratio of 5:6), but I was inspired me to find out what the relevant standards say it ought to be, exactly.

The relevant standard in question is SMPTE 170M - composite analogue video signal (upon which CGA is based). This gives an aspect ratio of 4:3, but that is for the full composite picture which 242.5 lines rather than the CGA's 200. The width is given in terms of timings - 63.556 microseconds per scanline total minus 1.5+9.2 microseconds for the blanking period, with a tolerance of +0.3/-0.2 microseconds, so between 52.556 and 53.056 microseconds altogether. Since the full horizontal period consists of 455 CGA low-res pixels horizontally, the full NTSC active area is the equivalent of (376.25-379.83)x242.5 CGA pixels. Re-arranging, that gives us a screen aspect ratio for CGA of between 1.362 and 1.375 - slightly wider than the usually quoted value.

However, no TV or composite monitor of the time was manufactured to have aspect ratio tolerances as precise as 3% - 4:3 would have been well inside the error bars.

How to set 200-line text modes on EGA

Sunday, October 2nd, 2011

If you tell an EGA card that it's connected to a monitor capable of 350-line modes by setting the appropriate switches on the card itself, it will by default use these 350-line modes for its text mode (using the 14-scanline character set instead of the 8-scanline one, yielding higher fidelity text).

But sometimes you want a 200-line text mode. In particular, there is an obscure 160x100 16 colour mode of the CGA which was obtained by using 80-column text mode, filling the text characters with the "left vertical bar" or "right vertical bar" characters (221 and 222 repectively), disabling blinking and setting up the CRTC for 100 visible rows of 2-scanline characters. This was used by some Windmill Software games and a few others. With VGA you can do the same thing with 400-line text mode (by using 100 visible rows of 4-scanline characters).

But how do you do it with EGA? One way is to program all the registers directly (as one must do on CGA) using the timings, sync polarity and palette from 200-line graphics modes and all other settings from the 350-line text modes. As mentioned in yesterday's post, we must use the palette 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17 instead of the usual 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x14, 0x07, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f because the monitor will interpret bit 4 as intensity instead of secondary green in 200-line modes.

Another way (which may be either more or less compatible with clone EGA cards) is to fool the BIOS into thinking a 200-line monitor is connected. The EGA BIOS reads the card's switches only once at boot time and then stores the results in BIOS memory at 0x40:0x88 and uses this value instead of the hardware value at mode-setting time. The low nybble of the value at this location is 3 or 9 for 350-line monitors and the corresponding values for 200-line monitors are 2 and 8 respectively. So an alternate algorithm is to check the byte at this address, decrement the low nybble if it's 3 or 9, store it back, do the "int 0x10" to set text mode, set the Maximum Scan Line Register to 1, disable blink and fill the text characters with a vertical bar.

Here is the Vintage Computer Forum thread that inspired me to find this out.

Rewinding DVDs

Saturday, September 10th, 2011

Some DVD players now remember where you stopped playing a DVD, even if the disk is removed. When you put the disk back in, it reads the disk identifier, looks in its memory to see if it has a previous position for that disk, and (if it does) starts playing from that point.

This is all very well, except for the situation where you have a rental house containing such a DVD player and a selection of DVDs to watch - at the end of the rental period, the remembered positions of the DVDs might not all be at the start, leading to somebody putting on a movie and it unexpectedly starting half way through.

What is needed to solve this problem is some kind of mechanism... for rewinding DVDs!

(Perhaps just a button on the front that says "rewind all" which causes all the remembered positions to be forgotten. Or what our DVD player does when it starts somewhere other than the beginning - putting a "press such-and-such button to start from the beginning".)

Fortunately the problem is rather less severe than the problem of rewinding VHS tapes, so we won't have to remember to rewind rental DVDs.

Why is TV static in black and white?

Sunday, August 16th, 2009

Have you ever looked at the static that appears on a TV screen when it isn't tuned into anything? If so, you might have noticed that it's in black and white. That fact always used to puzzle me - the patterns are random so surely all the colours that can appear on the TV should be equally likely, right?

It wasn't until fairly recently that I learned why this is. Colour TV signals are a little bit different than black-and-white TV signals - a certain frequency band within the signal is used to transmit colour information. That band corresponds to high frequency horizontal detail (patterns about 1/200th of the width of the screen). In a colour TV signal, those details are elided and the information used to carry hue and saturation information instead.

However, if you're watching a black and white programme you can get a sharper picture by using those frequencies for horizontal detail. So colour TV sets were designed to have two "modes" - colour mode and black-and-white mode. A "colour burst" signal is broadcast in an otherwise unused part of the signal which has the dual purposes of signalling that colour information is available, and calibrating the correct hue phase offset (the "colour burst" signal, if it were on screen and within gamut, would be a very dark olive green colour). This signal has to be present for about half a field before the TV will switch to colour mode. This is an imperceptably short time but stops the TV flickering in and out of colour mode if the signal is marginal.

Having a signal of the correct frequency at the correct time for that period of time is extremely unlikely to occur by chance (and even if it did, it would disappear again before you had the chance to notice it). So when the TV is showing static, it thinks it's showing an old black-and-white movie and turns off the colour interpretation circuitry, leading to black-and-white static.

Screen aspect ratios as musical notes

Saturday, July 5th, 2008

Aspect ratios of televisions and computer monitors tend to be ratios of small integers. On my desk right now I have 3 monitors that are 4:3 and one that is 8:5.

Another thing involving ratios of small integers is musical intervals in Just Intonation.

C 1:1
C# 16:15
D 9:8
D# 6:5
E 5:4 5:4 monitors exist but are not common
F 4:3 the most common non-widescreen aspect ratio
G 3:2 3:2 monitors exist but are not common
G# 8:5 the most common computer widescreen format
A 5:3 5:3 monitors exist but are not common
A# 16:9 HDTV format
B 15:8

PAL version of demo machine

Wednesday, July 2nd, 2008

The demo machine described the other day could easily be generalized to PAL output. There are some complexities though. Because of the missing quarter cycle of the carrier frequency per line, the PAL signal for a still image repeats every 4 frames. This means that in order to do the same "extremely simple, highly standards-compliant demo" that was possible on the original demo machine for NTSC, we need 2.7Mb of sample data. Let's run with this and round up the PAL machine's memory to 4Mb - rather than making the PAL machine as similar as possible to the NTSC machine, we should take the opportunity to introduce some variety.

Similarly, the CPU clock speed for the PAL machine should be (exactly) 17.734475MHz.

Generating interesting standard PAL signals does have some complications that NTSC signals don't have. Because of the 25Hz offset, the colour carrier frequency starts at a different phase on each line, meaning that a sprite needs to have different sample data depending on its vertical position. I expect that most demos written for the machine would use one of three simplications of the PAL standard (as most if not all computers and consoles that generate PAL signals did):

  1. eliminate the 25Hz offset so that the colour carrier phase repeats every 4 lines
  2. use a whole number of subcarrier cycles per line (making the chroma patterns vertical)
  3. eliminate interlacing, doubling the frame rate at the expense of halving the vertical resolution

These simplifications change the horizontal and vertical retrace frequencies slightly from the standard 15.625KHz and 50Hz rates, but not so much that real hardware is likely to fail to display the intended image.

NTSC decoder

Tuesday, July 1st, 2008

I want to write a piece of software that simulates a colour television (well, monitor really). It would really be a filter - the input is a sampled, quantized composite (CVBS) signal at some sample rate and the output is a series of video frames scaled to some resolution. I'd want the composite->RGB transformation to be done at the same time as the horizontal scaling to maximize quality whilst minimizing computation time. That means dynamically generating the appropriate filter kernel for a particular pixel width and sample rate.

Such a thing would be particularly useful for emulating old computers and consoles which generated a composite colour signal directly (including the demo machine of yesterday's post). In particular, it would render colour artifacts and interlacing perfectly). It would also be useful for simulating (for example) how an image would look on a TV for applications like DVD mastering.

This is the kind of image it will be able to generate (this mock-up was done at fixed resolution and not in real time):

No high-frequency chroma filtering was done on this image, so you can see the chroma artifacts. I wanted to include an animation showing the dot-crawl effects, but animated GIFs don't go up to 60fps. It does look uncannily like a TV screen though.

Writing this filter might even inspire me to fix up the NTSC emulation in MESS and improve the video emulation of machines like CGA, Apple II, CoCo and Atari 400/800.