Privacy-Friendly Dutch Postcode Lookups

– Updated:

This video called “A Privacy-Preserving Postcode lookup tool” (blog post) peaked my interest. Lex from Computing: The Details created a privacy-friendly lookup tool to get locations from UK postcodes. I was intrigued; could this work for Dutch postcodes too?

But why?

Websites sometime have a page where you can search for the nearest store. You’ll enter your postcode and the website uses an API to get the coordinates of that postcode. It’ll then sort the stores by distance.

I would be nice if the website doesn’t need to know your postcode and a third party API can be left out. And that’s possible! This lookup can be done client-side.

Postcode format

A quick rundown of Dutch postcodes:

  • They consist of 4 digits, a space and 2 letters: 1234 AB (the space is useful for readability but is often omitted).
  • The ranges 1000-1999, 2000-2999, ... loosely follow the province borders.
  • They usually don’t cross municipality borders.
  • A full postcode + a house number is a unique combination and is all you need to send mail. But the postman will thank you for including the street name and town.
  • The Caribbean islands Bonaire, Sint Eustatius and Saba do not have postcodes yet.

In theory there are

10 x 10 x 10 x 10 x 26 x 26 = 6,760,000

possible combinations, but in practice the number of postcodes currently in use is much lower. Some are assigned to PO boxes and many reserved for future use.

Unfortunately I don’t own a 1400-page Postcodeboek to use as a reference, but luckily the CBS publishes a dataset with statistics of postcodes including locations under a Creative Commons license. It contains 464,964 active postcodes at the moment.

Accuracy

This table contains the precision of decimal degrees. And as always, there’s an xkcd for that!
For our use case of finding the nearest store we don’t need millimeter precision. It tells us a precision of 111 m can be stored with three decimals. We could go more precise with 11,1 m with four decimals, but that’s adding precious data for not a lot of precision gains.

Furthermore, the locations from the postcodes are an approximation. The real user location wouldn’t be in the center of a postcode area.

I tried using just 4-digit postcodes, but the precision gets pretty low. It wouldn’t be as useful if there are multiple stores closer together.

Packing bytes

The original post uses some really cool ways to store and optimize the data. But since UK postcodes have a completely different format it means we need to start from scratch.

Extracting the coordinates from the GeoPackage results in something like this:

postcode6,lat,lon
1034XZ,52.40376675629908,4.907777651195642
1058EH,52.36149008188852,4.844611782197562
1082MD,52.33777631760533,4.87075088206934
...

Applying the precision gives us this:
1234AB,52.123,4.123 (19 bytes)

But we can optimize that! We say goodbye to csv and manually start packing bytes. The first thing that can be removed is the 5 from 52. as each longitude lies on that 5x-longitude. Also, since each character is now in a known position, we can omit the commas and decimal points resulting in this:
1234AB21234123 (14 bytes)

To be exact, there are 464,964 active postcodes so this results in a 14 x 464,964 = 6,5 MB file. Not ideal.

The next thing I tried was assigning each postcode –active or not– a position. If active store the coordinates and if not, zeroes. Once the last active postcode is processed, we can stop packing. The result was a 10 MB file, which compressed quite well with gzip: down to 1,2 MB.

A new idea came to mind: what if we have two packed files. A similar file that stores the coordinates coords.bin, and a new file bitmap.bin that stores if postcodes valid or not. Both files will have known positions for the postcodes and this way we can get rid of the unused space between valid postcodes. And we can start at 1000AA because lower postcodes are not in use currently.

PostcodeExists?
1000AA1
1000AB0
1000AC1
bitmap.bin: 101...

coords.bin: 2123412324534562...
            <1000AA><1000AC>...

A small improvement, down to a combined 0,965 MB compressed:

bitmap.bincoords.bin
uncompressed760 KB1,86 MB
gzip83 KB882 KB

I also checked the Rijksdriehoek coordinate system which looks like this: 463000.155000. The precision is in meters so a precision of 100 m would be 4630.1550. That’s also 8 digits so not an improvement from using decimal coordinates. And there isn’t a common digit that could be omitted.

geohash

But what if the location is not stored as coordinates?

Meet geohash. An algorithm to store locations as a string of up to 9 characters. For example: u179gke7c.
A precision of 6 results in a resolution around 1 km. This is a step down in accuracy, but still adequate and it can save a bunch of data.
All our coordinates are in the u1 area, so we can omit that part. Now the string only has 4 characters: 79gk.

With that we’re down to a combined 350 KB compressed. That’s not the full story because we also have some overhead for the scripts to unpack and to convert the geohash back to coordinates.

bitmap.bincoords.bin
uncompressed760 KB1,86 MB
gzip83 KB267 KB

You might look at these tables and think: that’s both 1,86 MB uncompressed!

The lat/lon coordinates can be stored as 2-byte unsigned shorts, totalling 4 bytes:

lat2 = lat[1:].replace('.', '')[:4]
lon2 = lon.replace('.', '')[:4]
coords = struct.pack('>HH', int(lat2), int(lon2))
coords_bytes.extend(coords)

While the now 4-character long geohash is encoded in ASCII which is 4 bytes in total:

hash = geohash.encode(float(lat), float(lon), precision=6)[2:]
coords_bytes.extend(hash.encode('ascii'))

It just turns out that gzip can compress the latter more in this case.


Combining the output files

It is a bit annoying we have two output files. But since bitmap.bin has a fixed length we can easily combine these:

-write_bytes('bitmap.bin', bitmap_bytes)
-write_bytes('coords.bin', coords_bytes)
+write_bytes('postcodes.pack', bitmap_bytes + coords_bytes)

When unpacking we just need to apply the known offset:

const bitmapLength = 6084000 / 8; // 1000AA = 9*10*10*10*26*26 bits
let packBytes = null; // all bytes in pack (bitmap + coords)
-const char0 = coordsBytes[(offset_validsum + coords_index)*4 + 0]
...
+const char0 = packBytes[bitmapLength + (offset_validsum + coords_index)*4 + 0]
...

The lookup speed

In the bitmap portion we stored if each postcode is valid or not. To find any coordinate we need to count the number of valid postcodes before that. We could do this for every lookup but it would be faster for postcode 2000AA and slower for the postcode 9000AA. It still takes 10-15 ms for a lookup and that’s not fun.

To speed this up we can calculate the number of valid postcodes for each 1000-range once after fetching the binary. Then during a lookup it’s just a matter of adding the number of valid postcodes in each prior block, plus the number of valid postcodes in the block the wanted postcode is in.

const offset_valid_count = {
    '1000s': 0, // 1000AA - 1999ZZ
    '2000s': 0, // 2000AA - 2999ZZ
	...
};

After fetching the binary we calculate the number of valid postcodes for each block:

for (let index = 0; index < 676000; index++) {
    if (bitmapBits[index + 0*676000]) offset_valid_count['1000s']++;
    if (bitmapBits[index + 1*676000]) offset_valid_count['2000s']++;
    ...

That takes the time for a lookup from around 10-15 ms to 0.1 ms. But that 10-15 ms is now performed once at the start after fetching the binary.

Another optimization we can do is cache the lookups. This way if you slightly edit the input and go back it doesn’t have to do the whole lookup again. It makes any second lookup for the same postcode take ‘0 ms’. I like that!

Whats next?

Check out a demo here!

You can find the code and installation steps over on GitHub. And if you have any ideas or comments, send me a message on Bluesky or good old email.

This is not a finished chapter! There’s more optimatizations that can be done. 350 KB is still a lot, especially considering API calls are a few KBs.

With that said, here are some notes and ideas for future updates:

  • The geohash is currently stored in ASCII, that’s 8 bits. While the geohash only uses the characters [0-9a-z]. So that could be reduced down to 6 bits.
  • Lex used delta encoding to store the coordinates. I haven’t tried that here and it’s something to investigate further.
  • The output file could include a header with version information. Added on 2026-05-21.
  • Since the valid postcodes are known we could add an autocomplete feature.
  • The Dutch overseas territories will get postcodes by the end of 2026 at the earliest. They will use the range 0001 AA through 0999 ZZ.