A local STAC deployment for UK Agri-Tech Centre

Oliver Morris

Business Director

In this blog, we introduce STAC, an innovative open standard that's reshaping geospatial data cataloguing. We've been working with UK Agri-Tech Centre, a leader in the UK Agritech space, to demonstrate how STAC optimises their drone dataset management. Additionally, we'll explore how FME can be used to generate new cloud-native data formats and automate STAC asset management, providing practical insights for professionals in the geospatial and agricultural technology sectors.

What is STAC?

The SpatioTemporal Asset Catalog (STAC) is an open standard designed to keep geospatial data neat, tidy and structured. Helping to address the challenge of making vast amounts of spatial data discoverable and usable across various platforms and systems with as little overhead as possible.

STAC provides a simple, consistent way to describe geospatial data. Each asset—whether it’s satellite imagery, drone-derived point clouds, aerial photos, or other spatial datasets (vector or raster) is described using a common schema, making it easier to use.

One of STAC’s primary goals is to enhance interoperability between different geospatial data providers and users. By adhering to the STAC specification, data providers ensure that their datasets can be easily discovered and used alongside other STAC-compliant data, regardless of the platform or application.

Metadata for billions of STAC items now exist within STACs created by NASA’s Interagency Implementation and Advanced Concepts Team (IMPACT), Microsoft Planetary Computer, and the Amazon Sustainable Data Initiative. Thousands of image tiles are dynamically generated within STACs daily.

STAC Structure

A STAC has the following components:

  • Catalog: A STAC Catalog can hold one or many collections of data.
  • Collections: Within each catalogue, Collections group similar data, like satellite images from a specific sensor or aerial photos of a particular region.
  • Items: Each collection is composed of items, representing individual data pieces. For example, a single satellite image or a LiDAR point cloud file would be an item.
  • Assets: Each item contains one or more assets, which are the actual data files themselves. These can be raster or vector formats.

STAC example Vancouver

Figure 1: Example view of a collection inside a STAC using the STAC Browser

Why use STAC?

STAC offers several advantages for managing and accessing geospatial data:

  • Cloud-optimised: STAC is designed specifically for cloud-optimised file formats, making it efficient in cataloguing and accessing large datasets.
  • Lazy access & intelligent subsetting: STAC allows you to specify filters based on time and location, allowing you to retrieve only the data you need. This dramatically reduces download sizes and processing time.
  • Scalable performance: STAC enables efficient handling of massive datasets because you're accessing only necessary data chunks.
  • Interoperability: Being an open standard, STAC promotes data sharing and collaboration across platforms and organisations.

STAC makes it easier to search and filter geospatial data by common attributes such as time, location, or data type. This searchability is crucial for users who need to quickly find relevant data across large and diverse datasets. STAC provides a common standard, it reduces the friction in discovering and using spatial data.

How can you use STAC in FME?

Since the release of FME 2024, there have been two transformers:

STAC Asset Reader

Allowing you to connect directly to a STAC, and offering the ability to filter items based on time and location, and retrieve assets.

STAC upload 2

Figure 2: The STAC Asset Reader in FME has preset STACs including Microsoft Planetary Computer

STAC Metadata Reader

FME can read metadata properties from a STAC (SpatioTemporal Asset Catalogs) Item and its ancestor Catalogs and/or Collections.

There is a great webinar Safe Software ran, covering cloud native formats and STAC - available here: Safe Webinar - YouTube.

These transformers are great for searching, filtering and retrieving data from a STAC, FME doesn't currently offer any STAC writers. However, STAC has an OpenAPI Description which can be loaded directly into the OpenAPICaller, through API POST requests using FME new collections, items and assets can be created.

FME local STAC openapi import

Figure 3: The OpenAPICaller in FME can load an OpenAPI Specification helping you find the right API endpoint for the task at hand – in this case, creating a new Item in STAC

Tensing developed a range of Workspaces to manage collections and add items. FME can also convert existing GeoTIFF and Point Cloud datasets into the new cloud-optimised formats, including COG (Cloud Optimised GeoTIFF) and COPC (Cloud Optimised Point Cloud), making it easy to publish assets into STAC Items.

STAC Requirements and Deployment

UK Agri-Tech Centre wanted to deploy a local STAC, that could be used to catalogue all the drone datasets collected, ensuring users work with these datasets from the same source and maintain robust attribution, are indexed, and most importantly are discoverable. This includes hyperspectral and point cloud datasets from agricultural land in the UK.

There are future plans to expose certain datasets to partners, STAC was a suitable solution in that it was, easy to manage, open source and worked with the latest cloud-native formats. One additional request was to permission datasets by collection.

STAC was deployed via containers on an Ubuntu server running Docker that hosted the eoAPI fork of STAC.

eoAPI

Figure 4: The eoAPI STAC fork created by Development Seed contains Infrastructure as Code (IAC) libraries for all the core components that support STAC, including FastAPI for STAC metadata search.

STAC does not inherently have any permissions, so we looked to use simple methods for controlling permissions. STAC typically works with object storage like S3 or Azure Blob Storage and for a local deployment MinIO was considered inside a docker container. Ultimately we settled on using Active Directory as permissions were already being managed there.

The cloud-optimised file formats were stored within a Windows NTFS file system and permissions for access were undertaken at the AD level. Files were served via a reverse proxy and file storage gateway. This ensured access to view the data within the STAC and downloads were controlled via standard Active Directory groups for granular access control.

Picture1-1

Figure 5: UK Agri-Tech Centre - Example Drone Cloud Optimised GeoTIFF Asset in STAC

FME Workspaces

Tensing created a STAC Loader - FME Flow App.

Tensing STAC loader FME Flow App

Figure 6 - The STAC Loader FME Flow App

The STAC Loader FME Flow App was designed to automatically handle the complexities of adding new data to a collection. It first checks whether the target Collection exists and creates it if necessary. You can pass it either a single file or an entire folder containing GeoTIFF or LAZ Point Cloud datasets, specifying the appropriate collection for the data and any additional pre-baked metadata such as the data owner.

Handling GeoTIFF Datasets - when you add a GeoTIFF dataset, the STAC Loader gets to work by:

  • Extracting key file properties.
  • Assessing the collection’s permissions (whether it’s restricted or open).
  • Generating a thumbnail image for easy visual reference.
  • Calculating and storing raster and band statistics.
  • Defining the bounding box and, if needed, reprojecting the dataset using the CoordinateSystemDescriptionConverter to ensure precise projection details.
  • Formatting all this information into the standard STAC JSON schema.
  • Creating a Cloud Optimised GeoTIFF for efficient storage and retrieval.
  • Updating the STAC record with essential metadata, including timestamps, the creator’s name, and their contact information.

Handling Point Cloud (LiDAR) Datasets:

  • The tool calculates vital statistics using the PointCloudPropertyExtractor and PointCloudStatisticsCalculator.
  • It defines the bounding box and reprojects the dataset if necessary, with the CoordinateSystemDescriptionConverter ensuring accurate projection details.
  • The information is then formatted into the STAC JSON schema.
  • A Cloud Optimised GeoTIFF is created, even for Point Cloud data, to ensure consistency across the collection.
  • The record is updated in STAC with the same level of detail as with GeoTIFF datasets.

Once the data has been processed in either format, the STAC Loader automatically generates an HTML report, summarising all the actions taken. This report provides a clear, concise overview of the dataset's integration into the STAC collection, making it easier to manage and track your geospatial data.

The STAC Loader simplifies and automates what can otherwise be a time-consuming process, ensuring that your geospatial data is consistently formatted, accurately described, and readily accessible within your STAC collections.

Future developments may include the automated loading of vector data including FlatGeoBuf and GeoParquet datasets.

While you are here - FME Skills Booster and training

I hope you liked this brief introduction to the world of STAC and how we helped support a STAC deployment using FME and cloud-native geospatial formats. If you did and would like to get some hands-on experience using cloud-native geospatial formats with FME we will be running a Tensing FME Skills Booster on this topic soon.

At Tensing we offer a range of FME Training covering Intro, Advanced and Authoring. If you would like to brush up your skills get in touch.

A special thank you to UK Agri-Tech Centre for allowing us to share this story.

TAGS: FME

Do you want to know more about this topic?

Schedule an appointment and let us advise you!