While exploring awesome lists scattered around Github, I stumbled upon Awesome C. The list is overwhelming. One library that caught my attention is libgeohash since it’s my first time to hear of such.
Apparently there is another way to express a location aside from latitude and longitude coordinates. Geohashing is a technique that represents a location using alphanumeric strings. The longer the string, the more precise the representation is to the actual location. This representation could also identify it’s neighboring squares from the north, east, south, and west.
Geohashes are commonly useful for URL and electronic storage.
Paris ~= u09tvw0f6szye
Tokyo ~= xn774c06kdtd9
Ottawa ~= f244mkwzxdcuk
Clicking on the geohashes above will bring you to geohash.org which will reveal the location of these cities from the geohash code itself!
CASE. I want to use libgeohash in Python but not necessarily make a strict implementation. So I decided to build a wrapper around it and expose only the common usages such as encoding from latlong coordinates to a geohash, and decoding it back. For this I will be using C Foreign Function Interface for Python
What is CFFI?
Borrowing from Lisp description, CFFI stands for Common Foreign Function Interface. “Foreign function” means taking a function written in another programming language along with it’s data and calling conventions. Python CFFI follows the same principle and outlines the specific goals here
In our case the primary goal is to compile libgeohash as a shared object (.so), load it using cffi in Python, and build an interface for it.
Building the interface
Using a shared object in Python
First, we will need a shared object of libgeohash. Download the source of libgeohash here
gcc -shared -rdynamic -fPIC geohash.c -o geohash.so
-shared
Produce a shared object which can then be linked with other objects to form an executable.
-rdynamic
This instructs the linker to add all symbols, not only used ones, to the dynamic symbol table. This option is needed for some uses of dlopen or to allow obtaining backtraces from within a program.
-fPIC
The generated machine code is not dependent on being located at a specific address in order to work
For more information on these options see GCC Options for Linking.
Then we can load it in Python like this.
# geohash_build.py
from cffi import FFI
ffi = FFI()
geohash_lib = ffi.dlopen('path/to/geohash.so')
Now we can define our interface.
A simplistic anatomy of the builder
The builder follows the structure:
Definition
All declarations including C types, functions, globals needed to use the shared object. A valid C syntax is required.
Source
Give the interface (the python extension module) a name, a valid C source code. #include
statements belong here.
Compilation
Finally, let’s run the build!
# geohash_build.py
from cffi import FFI
ffi = FFI()
# DEFINITION
# For reference, the definition is stripped from libgeohash.h where the data structures are defined
ffi.cdef(
"""
// Metric in meters
typedef struct GeoBoxDimensionStruct {
double height;
double width;
} GeoBoxDimension;
typedef struct GeoCoordStruct {
double latitude;
double longitude;
double north;
double east;
double south;
double west;
GeoBoxDimension dimension;
} GeoCoord;
char* geohash_encode(double lat, double lng, int precision);
GeoCoord geohash_decode(char* hash);
"""
)
# SOURCE
# Since the .so is ready, we can simply put None as the valid C source
ffi.set_source("_geohash", None)
# BUILD
if __name__ == "__main__":
ffi.compile(verbose=True)
We only need to build this once and if the build is successful a _geohash.c
is produced and invoked in the compiler. At runtime, we should be able to do
from _geohash import ffi, lib
print(lib.geohash_encode(35.689487, 139.691706, 6))
Completing the wrapper
Notice that in the cdef
section we exposed the functions geohash_encode
and geohash_decode
. In our interface we could do something like
# geohash.py
from _geohash import ffi, lib
def geohash_encode(latitude: float, longitude: float, precision: int = -1):
"""Takes in latitude and longitude with a desired precision and returns the correct hash value.
If precision < 0 or precision > 20, a default value of 6 will be used."""
return ffi.string(lib.geohash_encode(latitude, longitude, precision)).decode()
def geohash_decode(hash_code: str):
"""Returns GeoCoord structure which contains the latitude and longitude
that was decoded from the geohash. A GeoCoord also provides the bounding box for the
geohash (north, east, south, west, dimension.height, dimension.width)."""
return lib.geohash_decode(hash_code.encode())
A few points to take note from here:
- Use
ffi.string
to interpret the result into a null-terminated string. Notice the C signaturechar *
in ourcdef
section. - What we get from ffi e.g.
<cdata 'char *' 0xdc8770>
are objects of typecdata
. A pointer object on the allocated memory. These pointers, structures, and arrays do not have an obvious mapping to native types. That’s why in this case it’s appropriate for us to useffi.string
to be explicit. - The result of
geohash_decode
is aGeoCoord
object e.g.<cdata 'GeoCoord' owning 64 bytes>
. Remember thisstruct
from ourcdef
section? - Member access operators
obj.x
orobj->x
normally works asobj.x
in Python. So the following works:location = geohash.geohash_decode('xn774c') location.latitude, location.longitude # >> (35.69183349609375, 139.6966552734375)
Testing
To see if our wrapper works as expected, we can create a simple test.
import pytest
import geohash
geohash_map = (
((35.689487, 139.691706), "xn774c"),
((35.179554, 129.075642), "wy7b1h"),
((48.856614, 2.352222), "u09tvw"),
)
@pytest.mark.parametrize("coordinates, expected", geohash_map)
def test_geohash_encode(coordinates, expected):
lat, long = coordinates
# precision >= 0 then precision = 12
assert geohash.geohash_encode(lat, long, -1) == expected
assert geohash.geohash_encode(lat, long) == expected
@pytest.mark.parametrize("expected, hash_code", geohash_map)
def test_geohash_decode(hash_code, expected):
location = geohash.geohash_decode(hash_code)
assert all(
[
location.longitude,
location.latitude,
location.north,
location.east,
location.south,
location.west,
location.dimension,
]
)
The test leverages the use of [pytest.mark.parametrize
](@TODO insert link here) to cover a series of input and expected results without the use of excessive loops.
To execute the tests, simply run py.test
Wrapping Up
The complete source code is available at https://github.com/aldnav/geohash And available at PyPi so you can use it directly for your projects! Documentation? 🍰
pip install geohashcx
If you spot issues feel free to submit an issue or better yet a pull request! 👋
References / Further readings
- CFFI Definition https://common-lisp.net/project/cffi/manual/cffi-manual.html#Introduction
- A good explanation of the gcc flag “-fPIC” https://stackoverflow.com/a/5311538
- More perhaps complete GCC options documentation. https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html
- Cython vs CFFI. Valuable insights from Eevee while building sanpera - an “ImageMagick wrapper”. Old but gold. https://eev.ee/blog/2013/09/13/cython-versus-cffi/
- A pure C implementation of the Geohash algorithm. https://github.com/simplegeo/libgeohash
- Geohash explorer! http://geohash.gofreerange.com/
- Geohash.org http://geohash.org/
- Last but not least. The Python CFFI Documentation. Most of the substance is taken from here. https://cffi.readthedocs.io/en/latest/index.html