Revealing Hidden Issues in Spatial Data with FME

Oliver Morris

Business Director

The growing importance and complexity of spatial data present unique challenges. If the spatial data contains geometry issues, it complicates the visualisation and understanding of the data. Complex and often confusing geometries can also lead to data loss when saving out to certain databases or file formats, as they are unsupported. More critically, using software that assumes the data conforms to standard OGC geometry specifications without validating it before performing geometric operations may lead to invalid results. In this context, it's crucial to reveal hidden issues within spatial data and that's where FME comes into play.

Why Does Geometry Validation Matter?

Spatial data can be displayed and analysed in various ways to uncover relationships between different spatial and non-spatial data, yielding valuable insights. But if the spatial data contains geometry issues, it can complicate visualisation and comprehension of the data. Let's take a moment to explain some of the more common geometry validation issues.

Self-Intersections

Self-intersections occur when a single geometry feature intersects with itself, resulting in a single feature being represented by multiple disconnected parts. This can lead to data analysis problems, as a single feature may be counted multiple times.

Image 1: A self intersecting polygon (usually too small to be seen). Text continues below next image.

Self intersecting polygon

Image 2: A self intersecting line. Text continues below image.

Self intersection line

Gaps and Overlaps

Polygon slivers or gaps can arise from errors in the data creation process. These slivers can cause problems when interpreting and analysing the data as they may be overlooked or misinterpreted. For instance, if two polygons overlap, it might be difficult to determine which features belong to which polygon, leading to incorrect area calculations or spatial analyses. With these complexities in mind, Tensing undertook a comprehensive review of spatial data quality.

Image 3: A collection of polygons with a gap (usually too small to be noticed). Text continues below image.

Polygon gap

The Role of FME

At Tensing, we utilise APIs from open data portals to enable our clients to integrate frequently changing datasets within broader data workflows. We pull data from multiple formats and sources to create valuable derived data. We rely on the FME Form (formerly known as FME Desktop) and FME Flow (formerly FME Server) platforms by Safe Software to construct these data workflows and connect to APIs using minimal or no code.

The Review Process

We queried open data portals hosted by Opendatasoft for this review. Their robust platform, used by public and private organisations worldwide, has a consistently structured and user-friendly API.

We utilised FME Forms to build a solution that reviewed the spatial data hosted within the open data portals and tasked FME Flow with running the checks. FME includes a range of tools, known as transformers, one of which - the GeometryValidator - was used extensively to check for a wide array of geometry errors. We also ran a few other transformers to check for potential topology issues, such as tiny gaps between polygons. All results were stored in a cloud-hosted database.

Over a few days, we allowed FME Flow to process the portals autonomously. During this time, it accessed over 10,000 spatial data services and performed geometry validation and topology checks.

Our Findings

We found that 65% of API endpoints containing line data had geometry validation issues. More than 900 API endpoints with line data from 160 portals were processed by FME, identifying over 600 with geometry validation issues.

However, API endpoints with polygon data had less than 2% geometry validation issues. Topology checks for tiny gaps/overlaps showed that 45% of the data may need attention.

Point data did not raise any flags; owing to its nature, it is less likely to have geometry issues, although there were about 400 points located on Null Island.

Image 4: Null island buoy. Text continues below image.

Null Island

Ensuring Clean and Usable Data

Our aim with this review was not to criticise the open data that has been made available, but to showcase how FME can help make the data as valuable and useful as possible for use in analysis and visualisation.

Geometry issues can significantly impact the quality and usefulness of spatial data. It's crucial to identify and rectify these issues to ensure data is accurate and reliable. Fortunately, FME has numerous ways to identify and fix many common geometry validations automatically, thereby automating the process of rectifying common geometry issues.

At Tensing, we can assist you in validating your data by creating automated workflows that ensure your data is consistent and compliant with geospatial data standards. We can also work with you to clean your datasets using a range of fully and partially automated processes.

For more information about our validation workflow and how we can assist you, please contact us.

Do you want to know more about this topic?

Schedule an appointment and let us advise you!