Improved geohashing algorithm

Geohashing is an awesome idea (though I've never actually gone to any of the hash points). However, it does suffer from some problems - often the chosen location will be too far away, or in an inaccessible area. It is also slightly biased towards the poles (though because few people live in the polar regions this isn't a particularly big deal for most people). It would be nice to have an improved algorithm which solves these problems.

We'll start the same way, by hashing the date and that date's DOW opening to get a set of 128 random bits $R$ .

Given any area A we want to find a function (which we'll call $f(A)$ ) mapping a set of bits $R$ onto a point $x$ , $(f(A))(R) = x$ . We would also like to have the property that if $B$ is a subset of $A$ containing $x$ and if $(f(A))(R) = x$ then $(f(B))(R) = x$ . So if the New York State meetup happened to be in New York City, all the people who were just going to the New York City meetup would be there too. Another desirable property would be if the points $(f(A))(R)$ were evenly spread throughout $A$ .

Here is an algorithm that I think fulfills both properties.

First, use $R$ to pick a random point on the globe, taking 64-bit random floating point fractions $u$ and $v$ in $[0,1)$ from $R$ in the usual way and then transforming them into longitude and latitude as:
$\displaystyle long = 360frac(u+a(N))-180$
$\displaystyle lat = \frac{180}{\pi}cos^{-1}(2frac(v+b(N))-1)$
Where $frac$ takes the fractional part, leaving a number between 0 and 1 (one could also use XOR instead of add and $frac$ ) and $N$ is the smallest integer $> 0$ that causes the resulting coordinates to end up inside $A$ .

$a$ and $b$ are functions that find points that are furthest away from any tried so far. They have binary expansions as follows:

N  a(N)  b(N)  c(N)     d(N)
0  0.000 0.000 0.000000 0.000000
1  0.100 0.100 0.110000 0.100000
2  0.100 0.000 0.010000 0.010000
3  0.000 0.100 0.100000 0.110000
4  0.010 0.010 0.001100 0.001000
5  0.110 0.110 0.111100 0.101000
6  0.110 0.010 0.011100 0.011000
7  0.010 0.110 0.101100 0.111000
8  0.010 0.000 0.000100 0.000100
9  0.110 0.100 0.110100 0.100100
10 0.110 0.000 0.010100 0.010100
11 0.010 0.100 0.100100 0.110100
12 0.000 0.010 0.001000 0.001100
13 0.100 0.110 0.111000 0.101100
14 0.100 0.010 0.011000 0.011100
15 0.000 0.110 0.101000 0.111100
16 0.001 0.001 0.000011 0.000010

And so on, to as many binary places as you need. If you interleave the bits of $a(N)$ and $b(N)$ you get $c(N)$ , which looks like $d(N)$ but with each even bit XORed with the bit to its left. $d(N)$ is the binary expansion of the integers but with the bits in the reverse order (and flipped to the other side of the binary point).

The sequences $a$ and $b$ together are related to the Bayer matrix which describe a 2D ordered dither. If you want 65 shades of grey but only have a grid of 8x8 pixels, each of which can only be black or white, the most regular patterns with n black bits are described by those that have $8a(N)$ and $8b(N)$ black if $\text{[math]}$ N

This entry was posted on Friday, August 14th, 2009 at 4:00 pm and is filed under random. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Reenigne blog

Improved geohashing algorithm

Leave a Reply