Machine Learning on Geospatial Datasets for Segmentation, Prediction and Modeling


Machine learning is powerful technique that allows us to create predictive data driven models that can learn off complex multivariable data. The geospatial world is full of such datasets where its hard to know exactly how the input variables to your model will effect the outcomes. There exists a growing ecosystem of libraries and frameworks like Tensor Flow and Scikit-Learn that allow for sophisticated machine learning to take place but very few are easily interoperable with geospatial frameworks like PostgreSQL..

In this talk I will discuss ongoing work at CartoDB to integrate machine learning as a key analysis tool for geospatial data. Focusing on our work using random forests, neural networks and Markov chains I will talk about how these methods need to be adapted to work with geospatial data, how we can use the PL/Python extension in PostgreSQL to bring the power of these models to our geospatial data sets and discuss kinds of new analysis these methods open up

In particular I will discuss about our work to develop segmentation models that are able to take a set of example observations and train a predictive model based underlying multivariate geospatial datasets like the census and use this model to predict new observations in regions where the original data was missing..


Slides (External URL)

Session details
Speaker(s): Session Type: Experience level:
Track: Tags:
Schedule info
Session Time Slot(s):
302A - Thursday, May 5, 2016 - 10:30 to 11:05


Dear Speaker, Your talk was not selected for Big Data Day this year. (aside: we may need to have Big Data Day be 2 days next year!) We have switched the track marker to All Things Data, which seemed a good fit. Thank you for submitting such a great talk! Andrea
Public comment