Archive for October, 2012

The Bottom type in ALFE

Saturday, October 20th, 2012

This is part of the ALFE types series.

Bottom is a special type that has no values associated with it. So declaring a variable or member of type Bottom is forbidden. It's useful for using as the return type of a function that doesn't ever return normally (i.e. a function that always throws an exception or terminates the process).

[Bottom] is also another name for Null (or maybe it's a different one-element type, I haven't decided yet).

Another way of writing Bottom is Either<> (i.e. an algebraic data type with no types being summed).

I might end up calling it something different - the name has slightly too many connotations of anatomy and (worse) type theory for my taste. I'm not sure what would be better, though. Empty isn't quite explanatory enough, and NoReturn doesn't quite capture the full range of Bottom's behavior.

Fixed-length arrays in ALFE

Friday, October 19th, 2012

This is part of the ALFE types series.

Foo[2], Foo[3], Foo[4] etc. are arrays whose length is fixed at compile time (not to be confused with [Foo] whose length is unknown and possibly undefined even at runtime). Passing around a value of array type copies the entire array (arrays don't decay to pointers). Operations defined on the elements can also be applied (element-wise) to the array itself, so you can write things like:

Int[2] centre(Int[2] topLeft, Int[2] bottomRight)
{
    return (topLeft + bottomRight)/2;
}

Foo[2] also has a template form, but since template arguments can only be types and not integers like they can in C++, there is a twist: Array<Foo, Class {Int n=2;}>. Any type can be used for the second argument as long as it has a public member named n of type Int with value known at compile time. This allows templates to work generically with arrays of different lengths.

Foo[n] can be indexed by an integer to yield an (RValue or LValue) Foo. It can also be indexed by a sequence (whose values are known at run time) to yield a slice of the array.

An array can be coerced to a sequence, and the compiler will box up the element count and pointer as necessary.

Sequence types in ALFE

Thursday, October 18th, 2012

This is part of the ALFE types series.

[Foo] is the type of an immutable sequence of Foo values (the sequence itself is immutable but the values in it are not necessarily immutable). It is syntactic sugar for Sequence<Foo>. It has methods "Boolean isEmpty()", "Foo first()" and "[Foo] rest()". As usual the actual implementation is up to the compiler (and may even be different for the same sequence in different parts of the same program). Values like [], [foo] and [foo1, foo2] are allowed.

There is also a special operator for creating sequences of consecutive integers: "..". It's closed on the left and open on the right, so 0..5 is equivalent to [0, 1, 2, 3, 4]. Infinite sequences like 0.. are also allowed, since there's no need in the Sequence<> interface to know how many elements there are up front. I'd like to have consecutive sequences that count up in steps other than 1, but I haven't decided on a good syntax for that yet. Python's extended slicing syntax is quite nice, but I think ".." is more natural than ":", and I already have quite a lot of meanings for the latter.

The sequence type can also be used as the return type for a generator function:

[Int] primes()
{
    Primes = Class : [Int] {
        construct(Int m = 2) { n = m; }
        Boolean isEmpty() { return false; }
        Int first() { return n; }
        [Int] rest()
        {
            for (Int m in (n+1)..) {
                Int i = 2;
                while (m/i >= i) {
                    if (m % i == 0)
                        break;
                    ++i;
                }
                done
                    return Primes(m);
            }
        }
        Int n;
    };
    return Primes;
}

Disregarding the usage of trial division instead of a more efficient sieve of Eratosthenes, it may seem terribly inefficient to create a whole new Primes object each time we get the next element of the sequence. However, it is the expectation that the compiler should inline the entire rest function into the calling loop so that all the object creation and destruction is optimized away.

Product types in ALFE

Wednesday, October 17th, 2012

This is part of the ALFE types series.

(Foo, Bar) is a "product" type, aka Tuple<Foo, Bar> or a structure with unnamed fields. Tuple is another variadic template, so (Foo, Bar, Baz) is syntactic sugar for Tuple<Foo, Bar, Baz>. Values of Tuple type are written in the same way as the type itself, except with values instead of types inside the parentheses: (foo, bar). You can have tuples of LValues as well as RValues, which is handy for functions that return Tuple values:

(foo, bar) = getFooBar();

You can also have a tuple of declarations:

(Foo foo, Bar bar) = getFooBar();

(Foo) is identical to Foo just as it is in C, but in ALFE it's also a 1-tuple (so 1-tuplifying is really just a no-op). There is also a single 0-tuple type () aka Tuple<> which is another name for Null. There is a single value inhabiting this type, which is also (by slight abuse of notation that took me a while to determine wasn't too amiguous or confusing) called () and Null (a type specifier can be used as a value if it is default-constructable).

Maybes in ALFE

Tuesday, October 16th, 2012

This is part of the ALFE types series.

Foo? is a "Maybe" type, also known as an option type. It is syntactic sugar for Maybe<Foo>. This type has values of Null (a type with only one value) and Just<Foo> (which is derived from Foo) - it's equivalent to Null | Just<Foo>.

This is particularly handy if Foo is a pointer type, since ALFE's usual pointer type is not nullable, so this template will give you back C pointer semantics (except with more safety, since it'll force you to explicitly check for Null before dereferencing). Because of the Just<> template, it's possible to stack Maybes - so you can have Foo?? with two separate sentinels, Null and Just<Null>. If Foo is a pointer type, the compiler can implement the sentinels just as pointer values that are guaranteed not to point at real objects, just as C and C++ do, so there's no loss of efficiency.

Sum types in ALFE

Monday, October 15th, 2012

This is part of the ALFE types series.

Foo | Bar is a "sum" type, in other words a type that can have any values that Foo can as well as any values that Bar can. Also known as a tagged union or an algebraic data type.

Foo | Bar acts like a base class (supertype) of both Foo and Bar, since either of the latter are Liskov-substitutable for the former, and the downcast required to convert from the former to the latter is identical that required for casting from a base class. Foo | Baz also acts like a (completely separate) base class of Foo, so sum types provide something almost, but not quite, entirely unlike multiple inheritance.

Foo | Bar has all the methods and members of the most derived common base class of Foo and Bar (and, if that common base class has no other subclasses not derived from Foo or Bar, it'll be identical to Foo | Bar. In particular, Foo | Foo is identical to Foo).

Foo | Bar is syntactic sugar for a special template Either<Foo, Bar>. Similarly, Foo | Bar | Baz is syntactic sugar for Either<Foo, Bar, Baz> - Either is a variadic template. Either<Foo> is just another name for Foo.

How Either is actually implemented is up to the compiler, but some kind of boxing will probably be necessary in the case when Foo and Bar are not both pointers to types with vtable pointers.

Function types in ALFE

Sunday, October 14th, 2012

This is part of the ALFE types series.

Foo(Bar) is a function type - the type of a function taking Bar and returning Foo. It is syntactic sugar for Function<Foo, Bar>. Similarly, the zero-argument function type Foo() is short for Function<Foo> and the two-argument function Foo(Bar, Baz) is short for Function<Foo, Bar, Baz> so Function is a variadic template, a curious kind of type constructor which I'll talk more about in a future post.

A function type is a bit unusual in that it is not of definite size, so you can't declare a variable of this type on the stack or as a member of another type (the syntax for doing so would be too similar to declaring an actual local function or method instead). However, pointers to functions like Foo()* work fine, just as they do in C and C++ (albeit with an easier to understand syntax). Function types can also be used as template arguments. Raw function types might also come in handy for run-time code generation, but I haven't yet finalized the syntax for that.

Bar -> Foo is yet another way of writing Foo(Bar). I'm not sure if there's much of advantage to having this syntax as well as the other, but there might be some circumstances in which it makes more sense.

Pointers in ALFE

Saturday, October 13th, 2012

This is part of the ALFE types series.

As a high-performance language with C++ heritage, it should not be at all surprising that ALFE has pointers.

Given a type Foo, the type of a pointer that points to values of that type is Foo*.

In C and C++, the grammar binds the asterisk to the variable name rather than the type. So (particularly in C) declarations of pointers look like this:

Foo *foo;

That was very confusing to me when I first started learning C. Once I thought of it as:

Foo* foo;

instead, the whole thing made much more sense. In C++, this style is idiomatic, due to C++'s emphasis on types over expressions. However, the grammar hasn't caught up so:

Foo* foo, bar;

declares a Foo* called foo and a Foo called bar.

ALFE fixes this once and for all:

Foo* foo;

declares a real Foo* called foo rather than an imaginary Foo called *foo.

The second difference between ALFE and C++ is that pointers are by default not nullable. In other words, Foo* carries the connotation of pointing to an actual Foo, so null checking is not required (or even possible, since comparing a Foo* to Null will always yield false). In other words, ALFE fixes Tony Hoare's billion dollar mistake. Put like that, it's a no-brainer.

Of course, in practice there will be such things as an uninitialized Foo*, one that points to a Foo that went out of scope or one that points to one past the end of an array. However, those aren't things that could be checked for by a function which just takes a Foo* as an argument - if the pointer might be invalid there would have to be an out of band signalling mechanism saying whether it is safe to dereference that pointer or not. I'll be talking about an ideally suited mechanism in a future post in this series.

Foo* is syntactic sugar for the template instantiation Pointer<Foo>. This is mostly to simplify the implementation of the compiler, but it does also mean that the type constructor Pointer could be passed to some template that has a parameter of kind <>. Not sure why you might want to do such a thing, but it's there is somebody does find a use for it.

The ALFE type system

Friday, October 12th, 2012

I want to write a series of posts about the rich type system I want to implement for ALFE. This post will serve as an index to them.

I recently tried doing a bit of programming in Haskell, and although I didn't produce anything worth showing off, I did get a bit of a feel for the language and how it works. One of my favorite things about Haskell is the rich type system, and I want to use some of its ideas in ALFE. Some of the ALFE types below are inspired by Haskell types.

Given types Foo, Bar and Baz I would like ALFE to have the following types:

  • Foo* - a pointer type.
  • Foo(Bar) (aka Foo -> Bar) - a function type.
  • Foo | Bar - a "sum" type.
  • Foo? - a "Maybe" type.
  • (Foo, Bar) - a "product" type.
  • [Foo] - a "sequence" type.
  • Foo[2] - a fixed length array type.
  • Bottom - the "bottom" type.
  • Variant - the "variant" type.
  • Label - the "label" type.
  • Complex<Foo> - a complex number type.
  • Rational<Foo> - a rational type.
  • String - the string type (usually represented in memory as UTF-8 in some form).
  • CodePoint - a type capable of holding any Unicode code point.
  • Fixed<Foo, Bar> - a fixed point fractional type.
  • Boolean - the boolean type.
  • Int1, Int2, Int4, Int8, Byte, Int16, Int32, Int64, Int128, Int256, Int512, UInt1, UInt2, UInt4, UInt8, UByte, UInt16, UInt32, UInt64, UInt128, UInt256, UInt512, NInt1, NInt2, NInt4, NInt8, NByte NInt16, NInt32, NInt64, NInt128, NInt256, NInt512 - fixed-width integer types.
  • Word, Int, HalfWord, QuarterWord, DoubleWord, QuadWord, UWord, UInt, UHalfWord, UQuarterWord, UDoubleWord, UQuadWord, NWord, NInt, NHalfWord, NQuarterWord, NDoubleWord, NQuadWord, FastInt1, FastInt2, FastInt4, FastInt8, FastByte, FastInt16, FastInt32, FastInt64, FastInt128, FastInt256, FastInt512, FastUInt1, FastUInt2, FastUInt4, FastUInt8, FastUByte, FastUInt16, FastUInt32, FastUInt64, FastUInt128, FastUInt256, FastUInt512, FastNInt1, FastNInt2, FastNInt4, FastNInt8, FastNByte FastNInt16, FastNInt32, FastNInt64, FastNInt128, FastNInt256, FastNInt512 - machine-dependent integer types.
  • WordString, Integer, Unsigned - arbitrary-precision integer types.
  • Float16, Float32, Float64, Float128 - fixed-length floating-point types.
  • Float - machine-dependent floating point types.
  • Floating - arbitrary-precision floating point type.
  • Concrete (possibly also specializations like Length, Time, Area, MagneticFluxDensity etc.)

There's still a lot about this that isn't finalized. For example, I might use some templates to avoid having all those fixed-width and machine-dependent types.

Government as singleton

Thursday, October 11th, 2012

Sometimes I read things on the internet written by libertarians. I used to read reddit a lot and that has always been a bit of a libertarian stronghold. Occasionally I read things on reason.com. I think I'm pretty squarely on the left side of most political systems but libertarianism is one part of conservatism that I do have some sympathy for. I like the idea of legalizing all drugs, for example, not getting into wars without a really good reason, and generally not imposing unnecessary rules on people or companies.

Those who describe themselves as libertarians often seem to take this principle to extremes, though. One sentiment in particular that I've noticed being expressed repeatedly is that all taxation is theft. I disagree with this - theft is illegal while taxes are legal. Also, one has a say in taxes (via voting and other forms of participation in the political process) but no say in getting robbed. Finally, one doesn't get anything back from getting robbed but taxes pay for useful government services and infrastructure. To me the accusation seems like an instance of the worst argument in the world.

Another thing that libertarians say is that government has a monopoly on (legal) aggression and violence. That sounds like a bad thing (because monopolies are bad, and violence and aggression are bad). But in this case two bads make a good - you don't want multiple violent organizations competing, as that would not tend to minimize the amount of violence. Since we can't eliminate legalized violence altogether (since otherwise we would have no way to arrest an uncooperative murder suspect) it's best that a single (accountable) organization has that monopoly.

There are other things that government has a natural monopoly over - things that benefit society as a whole but won't get done by the markets because there isn't any profit in being the one to do them - things like making sure the poor have enough to eat and access to life-saving and preventative medical care. Another case is where having multiple competing organizations would cause practical difficulties: I don't want my house to be connected to six different electrical networks, six different water supplies, six different sewers, six different telephone networks, have driveways connecting to six different road networks, have six different garbage collectors coming by and I don't want my town to be served by six incompatible rail networks. The basic provision of utilities like these is best done by monopoly - even if building, servicing and billing can have competing providers (here in the UK I can choose from many different electricity companies to bill me each month, but they all have the same number to call if there is a power cut). Similarly, the military is probably best done centrally, otherwise different militaries representing different interests of the same country might end up fighting each other!

It makes sense to have one single organization take care of all the things that need to be done that can't or won't be taken care of by markets for one reason or another, and that organization is what we call government.

There's a similar concept in software engineering called a singleton object - a single chunk of memory where you put all the things that there is only one of in the program. It's bad engineering practice to stuff in the singleton that doesn't really need to be there, because it leads to inflexible programs that allow you to have only one at a time of something that you might want to have multiple instances of. Similarly, it's bad practice for the government to do stuff that the market can do better - I wouldn't want government to get into the business of designing and making laptops, for example, since the market does a great job at that.