User code in kernel mode

Most modern operating systems have a great deal of complexity in the interface between user mode and kernel mode - 1100 syscalls in Linux 2.4.17 and some 400 odd in Windows Vista. This has implications for how security (since all those calls need be hardened against possible attack) and also implications for how difficult it is to learn to program a particular platform (albeit indirectly, since the syscall interface is not usually used directly by application programs).

It would be much better if the system call interface (and the programming API on top of that) was as minimalistic as possible whilst still securely multiplexing hardware and abstrating away hardware differences.

The reason it isn't is that doing so would make an operating system extremely slow. It takes some 1000-1500 cycles to switch between user mode and kernel mode. So you can't do a syscall for each sample in a realtime-generated waveform, for example. There are many other ways that an operating system could be simplified if not for this cost. TCP, for example, is generally implemented in the kernel despite the fact that it is technically possible to implement it in userspace on top of a raw IP packet API. The reason is that network performance (especially server performance) would be greatly impacted by all the extra ring switching that would be necessary whenever a packet is sent or received.

A much simpler system API would be possible if user programs could run code in kernel mode - that way they could avoid the ring switch and all the other associated overhead. This was the way things used to be done, back in the days of DOS. Of course, back then every application you used needed to support every piece of hardware you wanted to use it with, an application crash would take down the entire system and there was no such thing as multi-user. We certainly don't want to go back to those bad old days.

Microsoft's Singularity OS (as I've mentioned before) solves this problem by requiring that all code is statically verifiable, and then runs it all in ring 0. Given how much code performance enthusiasts like I like to write highly optimized but completely unverifiable code, I think it will be a while before memory protection becomes a thing of the past.

But maybe there's a middle path involving both approaches - unverified code using the MMU and verified code running in the kernel. A user application could make a system call to upload a piece of code to the kernel (perhaps a sound synthesis engine or a custom TCP implementation) and the kernel could statically verify and compile that code.

With a suitably smart compiler, parts of the kernel could also be left in an intermediate form so that when the user code is compiled, the kernel implementations could be inlined and security checks moved outside of loops for even more speed.

Another thing that such a system would make practical is allowing applications to subvert the system calls of less trusted applications.

Leave a Reply