Floating point is not evil

In response to this:

Floating point is not evil and it is deterministic, but you need to know what the values you're working with actually are. Basically, floating point is like scientific notation, except with 2s instead of 10s. In other words, instead storing numbers like 1.234*10⁴ it stores numbers like 1.0011010010₂*2¹⁰. It's actually stored as a pair of numbers, a mantissa and an exponent. The number of bits in the mantissa is fixed (it's 23 bits for single precision and 52 for double) and the leading "1" is implied. Each number representable as an IEEE floating point constant has exactly one representation except for zero (+0 and -0 have different bit patterns for complicated reasons). There are complications (denormals, NaNs and infinities) which can usually be ignored and which I won't go into.

Floating point numbers are handy for lots of purposes but they do have a couple of problems.

The first problem is that floating point numbers are inefficient. They are very quick with today's hardware but consider what you'd do if neither floating point hardware nor floating point libraries were available. For most applications, you'd use fixed point numbers - you'd store an integer and it would be implied by the type of number you're working with that the actual numeric value is obtained by dividing this integer value by 2ⁿ for some n. For most purposes you probably wouldn't store that n with each integer - all your numbers have the same number of significant digits. For example, if you're writing a graphics program you might decide that units of 1/256 of a pixel width are always enough, so n would always be -8. When writing floating-point programs, most programmers don't do this calculation to figure out what the precision needs to be, they just use single precision floating point or switch to double if that isn't precise enough. While constant precision is preferable for a general purpose calculator, most actual applications are better served by constant resolution.

The other problem is that sooner or later you'll run out of precision. If you're plotting Mandelbrot sets, sooner or later you'll zoom in far enough that adjacent pixels have complex numbers with the same floating-point representation. If you're using FFTs to multiply big integers, sooner or later you'll want to multiply integers so large that floating-point numbers won't have sufficient precision. If you're using hardware floating point, this is quite difficult to solve (you need to find or write a big-float library) and will cause a big speed hit, so most people will give up at that point. However, if you're already using fixed point bignums, it's just a question of adding another digit.

This entry was posted on Monday, August 10th, 2009 at 4:00 pm and is filed under computer. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

2 Responses to “Floating point is not evil”

Paul Brook says:

September 3, 2009 at 3:26 pm

However not all floating point implementations are equal. If you require "double" to have at least 52 bits of precision they you're probably OK. If you require exactly 52 bits of precision (and consistent rounding of the excess) then you are liable to hit problems. Your favorite hardware may not even implement this (the x87 FPU is a common offender). From a user's point of view this means that the results are not deterministic, the result depends how the compiler chose to optimize your code.
This has been true pretty much forever: Traditionally Cray machines were notoriously lax when it came to floating point (I believe this is where negative zero comes from), but if you can accommodate this then you'll get an answer much faster than with a more pedantic implementation.

Reply
- Scali says:
  
  January 6, 2016 at 11:27 am
  
  The x87 actually does have control bits for internal precision and rounding.
  I think the problem is that most OSes just set it to extended precision by default, in which case having intermediate results on the FPU stack will yield different results than spilling to memory and reloading the values.
  
  I vaguely recall that Direct3D would modify the FPU precision, which led to some 'surprises' with some people, since code would yield different results when tested in a non-D3D application.
  There is a flag to change that behaviour. See here: https://msdn.microsoft.com/en-us/library/windows/desktop/bb153282(v=vs.85).aspx
  
  Reply

Reenigne blog

Floating point is not evil

2 Responses to “Floating point is not evil”

Leave a Reply