Archive for October, 2009

Sometimes, figuring out the right architecture is half the battle

Sunday, October 11th, 2009

I went through quite a few design revisions to get to the pipeline architecture I described yesterday. Some ideas I tried out and then abandoned:

  • Giving the filters the responsibility of keeping the data they needed.
  • Filters telling other filters how many samples they should consume or produce.
  • A FilterGraph object which held all the buffers and which had methods to make and break connections.
  • All the reader and writer methods being on the buffers.
  • Lookahead reader methods for filters.
  • Filters encapsulating other filters that they communicate with.
  • Having separate consume() and produce() methods.
  • Having the Reader and Writer functionality as part of the Consumer and Producer classes (this introduced a surprisingly significant overhead)
  • A Connection object to encapsulate the buffer.

It's rather difficult to tell what's going to work well and what isn't until you actually write some code. And then it takes some trial and error work to hit upon the right pattern. Assumptions must be called into question. Prejudices must be discarded. Darlings must be killed. But you know when you've got it right because the rest then practically writes itself.

Pipeline architecture

Saturday, October 10th, 2009

My software TV/monitor emulator is best thought of as a bunch of filters for transforming a signal in certain ways, connected together in a pipeline:

  • Decoding composite signals to YIQ
  • Transforming YIQ signals to RGB
  • Horizontal rescaling
  • Ghosting due to signal reflection in the cable
  • Adding noise

Because that's the best way to think about it, that's how I'd like to implement it. Then it will be easier to remove/replace filters that I don't need or that I want to implement in a different way. The filters.h file in the crtsim source implements this architecture.

When you have a pipeline, there are two ways to drive it. One is "consumer pulls" and the other is "producer pushes". In this case, the consumer is the code that actually renders the window to the screen. In "consumer pulls" mode, this code will fire at probably 60 times per second (potentially depending on the refresh rate of the monitor on the host machine) and each time it does, it will ask the filter which supplies its data for enough data to render one frame (or field, if we're doing interlaced signals). This filter will then in turn ask the next one along the chain for data and so on up the chain until we get to the code that actually generates the composite signal.

In "producer pushes" mode, the producer generates data at a constant rate (possibly fixed by some other source of time in the system such as the audio device outputting at the correct rate). This data is then pushed from each filter to the next in the chain until it gets to the consumer. When the consumer has collected enough data to render a frame, a frame is rendered.

So for the purposes of emulating a TV or monitor as part of a microcomputer system emulator, "consumer pulls" and "producer pushes" modes can be thought of as "video rate driven" and "audio rate driven" modes respectively. Most emulators are hard-coded to do one or the other. But which one is best is determined by the user's hardware and what they're doing with the emulated system (video driven mode will generally smoother graphics while audio driven mode will generally give more stable audio). So ideally we'd like to make the choice user-selectable.

A third possibility is for the producer code to decide when to draw a frame and to call for a window redraw, which causes a data pull through the filter chain. However, I've discounted this idea because that is an incorrect placement of responsibility. The producer doesn't (and shouldn't) know about the state of the monitor. Even if it has just produced a vsync pulse it doesn't necessarily mean its time for a new frame (if the monitor is "rolling" as it will do momentarily when the signal timebase changes, it won't be).

There is another factor in pipeline design which is how the data stream corresponds to function calls. The simplest way would be to have each sink call the corresponding source each time it needs a sample (in pull mode) or each source call its corresponding sink each time a sample is available (in push mode). However, there are potentially quite a few filters and (because they are all replaceable at run time) each call from one filter to the next will be a virtual function call. That means that the compiler can't inline the code and the CPU's pipelining will get screwed up. According to this one can expect a virtual function call overhead of maybe 13 nanoseconds compared to inlined code (a crude test, but sufficient for order-of-magnitude calculations). Since most of our samples will be at 14MHz (4 times the NTSC color carrier frequency) that's only about 5 virtual function calls per sample before we've used up all our CPU.

So each function call really needs to transfer a pile of samples, not just one, and we will need to have circular buffers in between the filters to keep the data. How many samples should we transfer at once? A good rule of thumb for figuring that out is that one's hottest code and data should fit in L1 cache (which is maybe 64Kb on modern CPUs). Divide that up by the number of steps in the pipeline and we're looking at low-single-digit numbers of Kb to be passed at once. A scanline's worth of data (910 samples, give or take) is probably about right. That reduces the virtual function call overhead by three orders of magnitude which puts it well into the negligible range. Conceivably one could try benchmarking with lots of different "samples per call" values and then pick the one with the best overall performance (taking into account both call overhead and cache misses). I'll probably do this at some point.

One disadvantage of the pipeline architecture is that it introduces some variable amount of latency - not enough to normally be visible to end users, but this does complicate one thing that I want to emulate - light pens. A light pen is just a fast light sensor that can be placed anywhere on the screen. When the electron beam passes underneath it, it sends a signal to the computer. The computer knows where the beam is supposed to be at any given moment, so it can figure out where the light pen is. However, for an emulator to have proper lightpen support, it needs to have very low latency between the screen and the machine emulation. For this reason, I might abandon the pipeline architecture and just hard-code all the signal munging effects I care about in the CRT simulator itself, processing a line at a time and stopping when the horizontal sync pulse is found. Then, if the lightpen is anywhere on the next line the CRT will be able to tell the machine emulation exactly when the lightpen is going to be triggered.

NTSC hacking

Friday, October 9th, 2009

Recently I've been playing about doing NTSC decoding in software, trying to build an software TV/monitor for emulation purposes. I originally wanted to do the decoding of sampled composite signals to RGB and the horizontal scaling in a single step (precomputing a finite impulse response filter which does it all). However, I have come to realize that while this would yield the fastest code, it's not sufficiently flexible for what I want to do.

Specifically, in the signals I want to decode, the horizontal sync pulses can happen at any point (within a certain range) which means that the relationship between samples and horizontal pixel positions is not fixed in advance. This means that it's better to do the decoding (at least composite to YIQ if not all the way to RGB) at a fixed frequency and then rescale the result to pixels in real time (possibly using linear or cubic rescaling).

Having determined this, I looked to see what other NTSC software implementations do. Blargg's NES filter rescales at a ratio of 3:7 at the same time as it decodes, then it's up to the calling code to rescale this to the right width. xanalogtv converts composite to YIQ at 4 samples per color carrier cycle, uses linear rescaling on the YIQ samples and then converts the result to RGB. The resulting pixels may be doubled or tripled to get to the right width. This also allows for nice effects such as "blooming" (widening brighter lines).

My current simulator is here and the source is here (Windows only for now - sorry). This uses similar techniques to xanalogtv, but the rescaling is done by the GPU, in RGB space. The scanline effects are a bit more accurate (all the scanlines appear to be the same width, no matter what size the window is), and a phosphor mask is displayed. Most reasonably modern machines should be able to display the images at full speed (60Hz). If your machine is too slow or your monitor doesn't run at 60Hz there may be some odd effects (most LCD panels run at 60Hz). I believe this is the only software CRT simulator that correctly renders both interlaced and non-interlaced signals, and has physically correct phase-locked-loop line frequency behavior. If I can figure out how to add a pixel shader for light bloom, I should be able to get images as good as these (except with arbitrary scaling in real time).

One other rough edge in this version is that the horizontal sync pulse is currently only found to the nearest sample. This means that the phase locked loop isn't very tunable, and will cause problems for PAL signals (where the horizontal sync position is at a different subsample offset on every line). That should be quite easy to fix, though.

This simulator is going to form the basis for my demo machine emulator. The emulator itself is trivial - in fact I have already written it. But I haven't tried it out yet because I have no program to run on it. First I have to write an assembler for it. I might tweak the instruction set a bit somewhat in doing so, so I don't want to release the emulator just yet. Watch this space!

What does supernatural actually mean?

Thursday, October 8th, 2009

I was discussing philosophy with a theist friend recently and the argument "this only applies to natural things and God is supernatural so this doesn't apply" came up. I've seen this argument in other debates as well, but I have to confess that I don't completely understand it. What does "supernatural" actually mean? The dictionary definition that seems to best apply is "unexplainable by natural law or phenomena".

There's two possible meanings to that. One is "unexplainable by the laws of physics as they are currently known" and the other is "unexplainable by the laws of physics even if we knew them all".

The existence of supernatural things of the first type is not denied by any sufficiently well-informed scientist - it's no secret that the laws of physics are incomplete. One possible example might be the details of the event horizon around a black hole - at small scales this requires a theory of quantum gravity, which we don't yet have. I think we'll eventually eliminate all such supernatural things by having a complete theory of physics.

I suspect theists would therefore prefer the second definition. But does this definition even make any sense? What would it mean for there to be phenomena in our universe for which no physical theory could be described to explain? Well, supernatural phenomena of that sort of can be said to exist too - when a quantum variable is measured and the wavefunction collapses, the result isn't necessarily determined by anything in the universe. But I don't see any theists suggesting that God acts on the universe by deciding how each and every wavefunction collapse occurs, despite the apparent omnipotence that such power would grant. I suspect this is because this would eliminate any possibility of human free will - quantum wavefunction collapse is part of all physical processes, so controlling quantum wavefunction collapse would mean controlling all our thoughts and actions. There would be no will except the will of God possible, making all religion (and indeed everything) rather pointless.

It seems to me there are philosophical reasons to reject the concept of a non-random supernatural process - if something is non-random there is some sort of (at least partially) predictable pattern to it, which means one could come up with a law of physics to describe that pattern, which means it's no longer supernatural. "Predictable" only means predictable in principle, though, not predictable in practice. Chaitin's constant (the probability that a random program will halt if run for long enough) for example isn't random but is uncomputable in the sense that only a finite number of its digits could be determined by any finite algorithm. Curiously, this number could be thought of as omniscient - it encompasses all mathematical knowledge (since it can be used as an oracle to solve the halting problem) but a number (even a real, uncomputable one) doesn't seem like it could match the theists' descriptions of God as having certain properties like compassion and goodness.

As well as the gaps in our knowledge of physics and the gaps caused by quantum improbability, there are also gaps due to the fact that there are some real world phenomena which we just can't do experiments on for one reason or another. We can't do experiments on UFOs because we can't predict when and where they will show up (though I'm sure that if one did show up in a suitably equipped science lab, laws of physics could be found to describe it).

Another thing we can't do physics on is subjective experience, simply because it's subjective. We don't currently have any technology by which one person can experience what it's like to be another person (and even if we did, there is no objective way to be sure that it's the same experience for both people - one can't compare subjective things with objective things). All we can do is ask people to report on their subjective experiences, and a personal report isn't as reliable a piece of evidence as a repeatable experiment.

Each one of us can't even be truly sure that other human beings actually *have* subjective experiences - maybe they're just p-zombies who say they do. It's a useful working hypothesis to assume that they do, though (and the opposite assumption would be rather dangerous for all concerned).

So perhaps God is Himself a subjective experience. That certainly dovetails with some things that theists say, like "I know God is in my heart and I've experienced His love, but I have no way to prove that to you". And objective evidence of God does seem to be rather thin on the ground, to put it mildly. If this is what God is then I am a teeny bit jealous of theists for having that experience that I never had (even when I was a theist, went to church, prayed regularly etc.).

If neuroscientists are able to generate religious feelings in others by stimulation of their temporal lobes with magnetic fields then, objectively, that suggests that religious experiences (at least these ones) are "fake" in the sense that there is no "objective God" causing them. But on the other hand, why should we treat subjective experiences as less "real" than objective reality? What could be more real to someone than something they have experienced first hand? By that logic, hallucinations caused by drugs or schizophrenia should also be considered "real" in this subjective sense.

Objective reality, on the other hand, consists of the things that we can in principle (given sufficient experimental data and suitable application of logic) convince other rational human beings of - in other words the things that we can in principle (if we're honest) all agree upon. As such, only objectively verifiable things should be used as a basis for public policy.

Memory handling in high-level and low-level languages

Wednesday, October 7th, 2009

The usual distinction between high-level and low-level languages is that high-level languages provide more abstractions. Assembly language is clearly a low-level language, and provides few very abstractions over the raw binary format consumed by the processor. Javascript is very high-level because it abstracts away almost all the details about the particular machine and operating system that it runs on, making it much easier to write actual end-user applications in (but impossible to write, for example, device drivers in).

One particular place where the distinction is particularly obvious is how different languages handle memory. There seem to be three basic strategies for handling memory:

  1. Implicit or explicit memory management (assembly language, C, C++) - calls to malloc() and free() (or equivalent) are inserted into the code by the programmer or by the compiler.
  2. Garbage collection with explicit state (C#, Java, Python, millions more) - no malloc() or free() are possible - all memory is "owned" by the garbage collector. Memory management is abstracted away.
  3. Referential transparency (Haskell) - there is no state, the concept of memory itself is abstracted away.

The boundaries can blur a little - you can plug a garbage collector into C++ for example, and high-level languages often provide "unsafe" functionality that allows one to escape to lower-level features.

Taxing hard work verses taxing luck

Tuesday, October 6th, 2009

In the year 2000, John McCain was asked by a student, "Why is it that someone like my father who goes to school for 13 years gets penalized in a huge tax bracket because he's a doctor. Why is that - why does he have to pay higher taxes than everybody else? Just because he makes more money. How is that fair?". To his credit, McCain did actually defend the idea of progressive taxation (those who make more money should pay a higher percentage of their income as tax).

It seems to be a common misconception amongst conservatives that progressive taxation is "punishing hard work" (or perhaps it just makes a good soundbite). But I don't think that even the most diehard liberals want to punish anybody for working hard - what should be (and is) punished is good luck. Many extremely wealthy people haven't had to work particularly hard for their wealth - especially those who have inherited it, but also those who have had the good fortune to be selling the right product in the right place at the right time (for example Microsoft in its early days with MS-DOS).

Also, there are a great many people who work extremely hard, but don't make a lot of money (those who work two or three low-paying jobs).

Progressive taxation decreases the reward for being lucky and thereby increases the amount of that reward that is due to hard work. So a (well implemented) progressive taxation strategy actually encourages hard work rather than discouraging it.

Not all progressive taxes are well implemented, though. In the UK there is a VAT limit of (currently) £68,000 per year. If your business makes more than that you have to register for VAT and pay 15 percent on all your profits (not just the profits over some limit). This greatly discourages people who are making just under the limit from working any harder, because if they did they would actually make less money.

I think one of the fairest taxes is the inheritance tax. This is much maligned as the "death tax", a moniker which doesn't make much sense to me as it isn't the dead person that pays the tax, it's the beneficiary. It's a fair tax because there's no hard work involved in inheriting money, you just have to have the good fortune to be born to wealthy parents. The downside of the inheritance tax is it's effect on the "family business" - a child wouldn't necessarily be able to afford to take over a business from his or her parents if there are large inheritance taxes to be paid, and this may lead to the destruction of some otherwise profitable businesses. But why, as a society, should we give that child the gift of this business and not expect anything in return?

A business which would be affected by this must have a large intrinsic value to be subject to the inheritance tax, but also make a small enough profit that it would take an unreasonably long time for the beneficiary to pay it back. I think the answer to this conundrum is investors - the beneficiary ceases to be sole owner of the business but the business can continue to exist and benefit society. Inheritance taxes should be set up in such a way that this can work.

Larrabee: Devolving the CPU vindicated

Monday, October 5th, 2009

I was interested to read that the cores of Intel's new Larrabee graphics architecture are based on the old P54C design, originally used in the second generation of Pentium chips (dating back to before even MMX made its first appearance). The core has since been updated with the latest instructions (including x86-64) but is missing performance features such as out-of-order execution. This vindicates what I wrote in devolving the CPU - that individual cores will get simpler and slower again as they get more numerous.

Setting constraints when programming

Sunday, October 4th, 2009

When writing a program, I generally don't just write it in any old way that might work, I set myself some constraints. Generally these constraints are particular programming principles that I have found make my life easier in the long run, such as choosing appropriate names. Sometimes I'll try to maximize speed or minimize memory usage. For Argulator I tried to minimize the number of SQL queries I made to satisfy any given request (which led to some convolutions to do appropriate data caching on the PHP side.) I also tried to make the page sent by the server as close as possible to the page that actually gets rendered (i.e. make Javascript not needed to display the page, just to interact with it). I mostly succeeded, at the expense of having some code duplication between the server side and the client side (though the argument creation UI is partially created by Javascript).

Sometimes I'll try to eschew certain operations, such as eliminating the square root operation in the algorithm to determine if a point is in the Mandelbrot cardioid. Often I'll try to eliminate subtraction and division and write algorithms in terms of addition and multiplication instead. This might seem to be a strange thing to do but the resulting algorithms are often more concise and it's easier to tell that they're correct. This can also eliminate division by zero errors.

Tet4 is a game of luck

Saturday, October 3rd, 2009

I used to think it might be possible to come up with some kind of general algorithm for playing Tet4 indefinitely, but I now suspect that no matter how tall the pit is and how many pieces you can look ahead, there are always sequences of pieces that will guarantee a loss - in particular, I don't think there is any way to place the sequence SSZ without adding a row every 6 pieces. This means that ultimately, Tet4 is a game of luck. However, most sequences that occur in practice do seem to be solvable - many years ago I wrote a program to play Tet4 automatically, and it seems to be able to play indefinitely. So in practice speed and accuracy are more important than luck for winning the game.

Side note: On the most recent occasion that I tried to figure this out, I attempted to resurrect my old Tet4-auto-player program (not trivial, since it's a graphical DOS program and hence doesn't work on Vista) and modify it to see if it could solve the SSZ repeated sequence. Once I'd got it running, I was amused to see that it was already set up to solve the SSZ repeated sequence - I'd done the exact same thing before and forgotten all about it!

Talking of Tet4, it's had a couple of updates since I last mentioned it here. The original version has had a bug fix so that playback of games in which the curtain reaches the bottom works properly. Also, a second version has been added, which is exactly the same but with no undo (and which is therefore less forgiving). This may be one of very few occasions on which the second version of a piece of software is identical to the first version apart from the removal of a significant feature.

When I first wrote the Javascript version of Tet4, I took the opportunity to change the keys from the DOS version a bit, to make them more logical and faster (so that if there were 5 or fewer possible positions for a piece, either hand could be used, for example). This meant that I kept making mistakes because (although I hadn't played Tet4 for years) my muscle memory was still trained with the old keys. Hence I added the undo feature. But in making the game more forgiving, I changed its nature significantly.

There's two ways I could have added undo to Tet4. One way is to undo the random number generator one step when undoing. The other is not to. Undoing the random number generator essentially gives you unlimited lookahead capability. To look ahead n pieces, drop those n pieces (anywhere) and then undo n times. I didn't want to grant that power, so I wrote it in such a way that dropping n pieces and undoing them changes the random number generator state. However, this gives an even stronger ability - now you can essentially choose which piece comes next by dropping and undoing until you get the piece you want. I think this makes the game too easy (at least until it gets really fast), hence the new version.

Text editor as shell

Friday, October 2nd, 2009

The text editor interface I talked about yesterday is so good I find myself wanting to use for things other than text editing. In particular, I'd love to be able to use it as my command line interface. Most command line shells are annoying because you can't use the keyboard to move the cursor up into a program's output to copy it - you generally have to resort to the mouse. I think it would be far better to treat the entire command-line session as a document. One way to do this would be to have a text editor with a special key combination (say Ctrl+Enter) which takes the contents of the current line, executes it as a command, and pastes the resulting output into the document immediately below, followed by a prompt, followed by the cursor.

One useful aspect of command line interfaces is tab completion - it would be nice if there were a way to make this continue to work in the editor interface. Perhaps tab could be interpreted to mean "tab completion" if the line above the cursor was a prompt (maybe prompt lines are "special" in some way, though that counteracts the principle that there should be no "hidden" information). If prompt lines are special, then maybe pressing Enter on the line below would suffice for running commands, instead of a special key combination for this.

A variation on this idea would be to allow editing of multiple files in a single buffer. Suppose you're using the editor interface and "cat" (Unix) or "type" (Windows) the contents of a file, causing it to be inserted into the buffer. I think it would be tremendously useful if that file was then "live" and could be edited by just moving the cursor up into its contents, changing things, and then hitting some other key (perhaps Alt+S?) to save it. Again this goes against the "no hidden information" principle, though. Perhaps one of the ASCII control codes that aren't usually found in text files could be repurposed to have special meaning to the editor, to signify a line that is a prompt or contains the path to an embedded file.