GeoWave: How Space Filling Curves accelerate ingest and query of Geospatial data
GeoWave is an open source project that bridges the gap between geospatial software and distributed compute systems. This presentation will primarily focus on the theory that enables the core functionality of GeoWave.
GeoWave was developed at the National Geospatial-Intelligence Agency (NGA) in collaboration with Booz Allen Hamilton and RadiantBlue Technologies. GeoWave leverages a distributed key-value store to manage terabytes of raster and vector data, serving as an enterprise level geospatial data store. To efficiently index geospatial data and answer queries with geospatial constraints, GeoWave employs a space filling curve to form bidirectional mappings between multi-dimensional data and sorted keys. Space filling curves provide an efficient locality sensitive indexing scheme for proximal data. Beyond initial indexing, as a complete offering, Geowave leverages the space filling curve for optimizations to render interactive maps, to compute uniform sized map-reduce input splits, and partition locality sensitive data for near neighbors class of algorithms.
Working with large ranges of numeric data, multidimensional mapping onto a space filling curve carries with it several key challenges. Indexing on a numeric range of data, such as a bounding box or a period of time, can exhibit a large number of duplicates when applied to a highly granular space filling curve. Furthermore, queries over large ranges can result in the decomposition of many independent ranges over the space filling curve. We introduce several techniques to solve these problems.