OpenAddresses: A collection of open address data

Status:
Accepted

In this talk, we'll give an overview of OpenAddresses, a constantly growing collection of address-level geographic data from around the world. We collect data from over 1,100 data sources, lightly modify it to pick out the fields we need, and create a CSV with over 215 million rows of data. The output of our community-built system is used by commercial and government entities to build geocoding tools. A community of 70+ people contribute to this dataset via GitHub, finding new data sources and submitting pull requests for data while a background processing system downloads and manipulates the data in real time. This same setup can easily be used to fetch and merge similar sets of data in a collaborative way and we can show you how it works.

Session details
Speaker(s): Session Type: Experience level:
Intermediate
Track: Tags:
Schedule info
Session Time Slot(s):
Wednesday, May 4, 2016 - 11:15 to 11:50

Comments

I've used your data quite a bit, and from my experience its quality varies from place to place. (some are missing city names, some missing street names, latitude/longitude is 0.0000, or in some cases the CSV is not properly formatted. Do you have (or plan to put) a system in place to detect these errors?
PS. I've already submitted a bug request on Github about several eastern european countries.

Public comment

The missing street names and lat/lon issues are probably from the source data, but the CSV formatting sounds bad. I'd love for you to file a new ticket that shows an example of the incorrect formatting.

In general, we are planning on adding a set of cleanup steps for the compile data: things like deduping, expanding abbreviations, etc. Discussion about that is happening here: https://github.com/openaddresses/machine/issues/283

Public comment