Archive for the ‘language’ Category

Using templates when you don't really need to

Sunday, September 25th, 2011

Templated C++ classes are a bit nicer to use than non-templated classes, because one can define the entire class within its declaration without having to make sure that the types it uses are defined first (this check is done at instantiation time for template classes, but at parse time for non-template classes). I have found myself making classes templates when they don't really need to be just so that I don't have to define member functions outside the declaration - i.e. when CTemplate doesn't actually use T for anything, and CTemplate is just used as "typedef CTemplate C;". T may be used as the template parameter to other classes defined this way, though.

Is Haskell too abstract?

Saturday, September 24th, 2011

The Haskell programming language implements a lot of very cool ideas (many of which I want to liberate for my programming language). However, Haskell seems to have a reputation for being a very difficult language to learn. The IO monad seems to be one particular sticking point, but this didn't seem to be particularly difficult to me - it's just a clever little hack (I just read lots of "How I came to understand the IO monad" accounts until I got it).

Another sticking point seems to be there's a lot of stuff written about Haskell in the form of academic papers (with all the academic jargon that entails). That shouldn't be a surprise (since it's a language with origins in academia which seems to be only reluctantly escaping to industry), but it is kind of interesting that there's a sort of language barrier between academia and industry - what the former calls "deforestation hylomorphism" might be called "tree flattening" by the latter, for example.

But I think the real problem is something shared by other functional languages - it's just a different level of abstraction than we're used to thinking about. In imperative programming languages programmers can think "okay, what do I want the machine to do here?" and write that code. In functional programming languages one instead has to think about what the inputs and outputs are, write functions to perform those transformations and trust that the compiler will optimize it well (to a first approximation). It's much more like doing mathematics than imperative programming. Compilers will do all sorts of crazy things to turn those pure functions into high-performance imperative code. So when things are inevitably too slow, it's not obvious why (since the relationship between the generated code and the source code is very complicated) and it's difficult to understand how to make it faster.

Functional programming languages do have some interesting advantages over imperative languages, though - it's much easier to reason about pure functions than about imperative programs with all their attendant state, which means lots more interesting automatic optimizations are possible and (increasingly importantly) the code can be automatically parallelized so it takes maximum advantage of modern multi-core CPUs.

I'm not sure which side to place my money on. I suspect both imperative and functional paradigms will continue to exist for the foreseeable future. On the imperative side: the low-level OS and runtime components must can't be written using functional languages, and optimizations and multi-core abstractions for imperative languages will continue to improve. On the functional side, compilers will require less and less hand-holding to generate good code and will generate better code than imperative compilers for more and more situations.

When is your program compiled?

Thursday, September 22nd, 2011

When we're all using languages which can compile code at runtime it's going to be more important to think about when your code is being compiled. Take a fractal plotter for example - one in which users can supply a formula to be iterated. Obviously we'd like to do some compilation here instead of just interpreting the formula, as the latter would be very slow. But how often should we recompile it? The more often we recompile, the more information we have available to us and the more optimizations we will be able to use. For example, if we just compile when the formula changes, we would have to use arithmetic operators which can work with any precision, but if we recompile whenever the precision changes we can unroll those arithmetic operators and make the code much faster (especially for low precisions). On the other hand, recompiling does have some overhead so we probably wouldn't want to recompile for each pixel. Though for some formulae that might actually be helpful - if we can hoist a test from per-iteration loop to the per-pixel loop and the iteration count is high it might be worth it.

One possibility might be to give the code-generation library the freedom to compile whenever it likes, so it can try various things and run with what works best.

A weird thing in Haskell

Wednesday, September 21st, 2011

Here's something odd I noticed while playing around with the Haskell programming language. Sometimes, a==b does not imply f(a)==f(b). Look:

> 1/0
Infinity
> 1/(-0)
-Infinity
> 0==(-0)
True
> (1/0)==(1/(-0))
False

Thoughts on ActionScript

Thursday, September 15th, 2011

Declaring variables "var i:int" instead of "int i" is a nice experiment, but I don't think it works. When I need to turn an assignment into a declaration or vice-versa, I now need to do it in two places (before and after the variable name) instead of just one.

That this gives an error about redeclaring i is just plain dumb:

  for (var i:int = 0; i < 10; ++i) { foo(i); }
  for (var i:int = 0; i < 10; ++i) { bar(i); }

It means I have to declare all my variables at the top, old-fashioned-C-style, or spend ages fixing up the declarations whenever I move code around.

I hate it when compilers enforce a "one class per file, with the same name" rule. I'd rather have my source files organized for the convenience of humans, not for the convenience of machines, thank you very much - if I have a bunch of small related classes, it's much better to have them in the same file.

No stack objects means that if you want to want to write a function that returns multiple values (e.g. a function that multiplies a matrix by a vector and returns a vector) you have to allocate an object to put the vector components in, which is very slow. For reproject I ended up just doing everything with plain values in a small number of functions - in C++ the code could be much more well-layered.

Progressively optimizing compiler

Wednesday, August 31st, 2011

Normally when compiling a program, you tell the compiler how much optimization you want and what the input files are, it goes off and does its thing and comes back when it's done. Sometimes I think it might be nice if one instead told the compiler how much time to spend doing the compiling. It would then do the absolute minimum to make a working binary, and then gradually do more and more optimizations until the timer ran out. This would make the time it takes to rebuild things much more predictable. One downside is that the performance of the resulting program would depend on unpredictable things like how busy the build machine was. Another is that it's probably more efficient for a compiler to decide upfront what the optimizations should be rather than making lots of intermediate binaries of different optimization levels.

However, this sort of thing might be more useful as a run-time process - i.e. a JIT compiler. The system can monitor which bits of the code are being run most often (this is very easy to do - just interrupt at a random time and see what code is being run) and concentrate optimization efforts on those parts. The compilation can continue (gradually making the program faster and faster) until the point of diminishing returns is reached. I understand there's a Java Virtual Machine which can do this.

Language optimized for refactoring

Friday, October 24th, 2008

One property of computer languages that is important but often seems to be overlooked is how easy it is to refactor programs written in them.

The one example that springs immediately to mind is renaming a class. In C++ this is a bit more difficult than in many languages because the constructors and destructors have the same name as the class, so you have to go and change all of those too. PHP wins here for calling them __construct and __destruct respectively.

If you are in the school of thought that has C++ method definitions in a separate file (e.g. .cpp) to class declarations (.h), you have to go and change things in two different files (even if you're just adding a method that nobody calls yet). If that class implements an COM interface defined by a .idl file then there's yet another thing you need to change.

Python's syntactically-significant whitespace is another winner here because if (for example) you put another statement in an "if" clause that currently only has one statement, you don't have to add braces.

I'm sure there are many other, deeper examples.

Once you go OOP, there's no going back

Thursday, October 23rd, 2008

Object Oriented Programming is at least as much a state of mind as a set of programming language facilities. When I learnt C++ it was a bit difficult to get used to writing object-oriented programs but now that I've been doing it for many years I can't get used to thinking about my programs any other way.

I was writing some PHP code recently and (not knowing about PHP classes) started writing it in a procedural fashion. After a while I noticed that many of the functions I was writing started to fall naturally into classes (with a first parameter that gave the function context). So it was only natural to re-write it in object-oriented style once I figured out how to do so.

In the process of doing so, I found lots of bugs in my original code (which I had thought was rather nifty). Many functions became much simpler. I also found it was much easier to do various optimizations that would have been very difficult to do without classes (such as minimizing the number of database queries). My code file did become somewhat bigger, but I attribute this to the extra indentation most lines have, and the fact that PHP requires you to write "$this->" everywhere.

I also tried writing a C program (from scratch) for the first time in a very long time a while ago. I found myself using an object-oriented style and implementing vtables as structs.

PHP could be more secure

Monday, October 20th, 2008

Given that PHP is designed to be used to write applications that run on web servers, you'd think it would have been designed rather more with security in mind.

In particular, PHP's dynamic typing seems to be a source of security weaknesses. Dynamic typing has advantages in rapid development and code malleability but is not particularly helpful for writing secure code - security is greatly helped by being able to restrict each variable to a specific set of values and having the compiler enforce this.

Similarly with the SQL API - because the interface is all just strings instead of strongly typed objects, SQL injection vulnerabilities becomes all to easy to write.

Variable scope is another one - because there are no variable declarations it's not obvious where variables are introduced, so one could be using variables declared earlier without realizing it (this is why register_globals changed from default-on, to default-off, to deprecated to removed).

Then there are ill-concieved features like magic quotes, and missing features like cryptographically secure random number generation.

A well-designed language for web development would be secure by default when doing the most obvious thing - one shouldn't have to go out of one's way to learn what all the security pitfalls are and have to write to explicitly address each of them (and update your code when the next such pitfall is discovered).

JavaScript vs PHP

Monday, October 13th, 2008

In order to implement Tet4 I learnt two new languages - JavaScript (or JScript, or ECMAScript - the language has a bit of an identity crisis) and PHP. Why PHP? It's installed on my web hosting server and seems to have a huge community of people writing code in it and pre-written scripts. It may not be the ideal language for writing web server apps, but it does seem to be the most well-supported.

JavaScript seems to be a very clean, pretty language. The whole closure thing seemed a bit weird at first but once I understood that "class" is spelled "function" and "public" is spelled "this." I got to rather liking it. I especially like how each scope has access
to the variables from all the outer scopes - that saves a lot of messing about. It's very well integrated with the browser - manipulating the DOM feels very natural and not tacked on.

PHP on the other hand is a bit of a mess. It is as if its designers had a little spinner with markings "C, C++, Perl" which they spun each day to decide what languages features to copy that day. If JavaScript was sent by God, surely PHP was sent by the devil.

W3Schools has been an excellent reference for learning all this.

I have to say though that automatically promoting integers to double-precision floating point numbers on overflow is weird. On IE7, computing the value of 1111111111*1111111111 gives 1234567900987654400 (you can easily see this is wrong because it's even). This caused a rather hard-to-debug problem with my random number generator (which assumed that when multiplying two 32-bit integers together, at least the low 32 bits of the result should be correct). If you're going to automatically promote numbers, at least have the decency to use a multiple-precision integer library - there are lots around.