4 steps to unleash the valuable insights hidden in your (spatial) data

Egge-Jan Pollé

Esri Technology Specialist

R for Spatial Data Science

These are exciting times for Data Scientists who want to incorporate location awareness in their workflow in R. R, an open source software environment for statistical computing and graphics, has been around since the early nineteen nineties. And although geospatial analysis in R has a long-standing tradition, there have been some interesting developments in recent years which have made incorporating geographic information in R more easy and intuitive than ever before. In addition, both ArcGIS and FME provide bridges to R, creating unique added value for both Data Scientists and GIS specialists.

And what’s new in R to be so thrilled about?

Firstly, already a few years ago, the package leaflet has brought interactive mapping to R. But more recently the mapview and tmap libraries, who build on top of leaflet, have even further extended the options for interactive exploration on the map. And secondly, the recent release of sf, Simple Features for R, has really caused a revolution in the R ecosystem with it’s simple and consistent approach to the management of spatial data in R.

In this blog I share four steps to incorporate the full potential of geography into your data analysis with R.

1.    Reading and writing geographic data

It is possible to load spatial data directly into R to combine these with data from other sources. The package sf supports a wide range of geographic vector file formats (points, lines and polygons with attribute data attached): from CAD and Shapefiles to data stored in spatial databases like PostGIS. With the function st_read() you can also access data directly from online sources like a WFS server or the ArcGIS Living Atlas of the World.

sf links directly to three important geospatial libraries, to unlock their power for use in R: GDAL for reading and writing data, GEOS for geometrical operations and Proj.4 for projection conversions and datum transformations. Of course, it is also possible to use raster GIS data in your analysis with the R package raster.

With both the sf and the raster package you can also export the results of your analysis in R directly to a geographic file format or spatial database for further use in other mature applications like the ArcGIS platform or FME. With ArcGIS Online you can easily share the results of your analysis with a broad audience. And you can use FME, the ETL tool for geospatial data, both at the beginning and at the end of your analysis to automate and manage the flow of data within your organization.

2.    Spatial analysis and modeling

With R you have access to the full range of analytical capabilities you would expect from a traditional desktop GIS. From spatial subsetting and aggregation to buffer and overlay analysis and modeling spatial phenomena. And here this revolutionary Simple Featuresapproach comes into play: the sf package treats spatial data almost as an ‘ordinary’ R data.frame. Almost, but not entirely, because sf data frames have this ‘special’ geometry column. But the fact that these sf objects also inherit from the class data.frame means that you can apply many R functions directly to your spatial data.

3.    Visualization

With R you can easily put your data on a map. There is no need any more to switch back and forth between R and a separate GIS application to view your data in their spatial context. Both static and interactive maps can be created quickly with only a few lines of code. Traditionally R is very good at producing high quality, publication ready graphs and figures and this is also true for static maps. And in recent years R has been extended with packages like leaflet, mapview, tmap and shiny, which allow you to create interactive web (mapping) applications to share you results. So, whether you want to create a thematic map or to plot multiple layers on top of each other, R offers the tools to do so.

4.    Reproduction

R’s command line allows you to script complex spatial and statistical analysis in an efficient and completely reproducible manner. This reproducibility allows you to clarify and explain to stakeholders every step taken and every decision made in your analysis. And of course, it also facilitates the extension or modification of your analysis the moment new, updated or more accurate data become available.

Do you want to really unleash the valuable insights hidden in your data? Without geography you are nowhere! Whether you have already looked into R’s geospatial capabilities or you are completely new to the subject: feel free to contact me to investigate the possibilities.

Please comment: How do you use R? Standalone / in conjunction with ArcGIS / in conjunction with FME?

Do you want to know more about this topic?

Schedule an appointment and let us advise you!