Spatial Data Science with Snowflake
Ivo de LiefdeEsri Certified Professional
At Tensing we’ve created an integration between FME and the Snowflake platform. This allows data science teams to include geospatial data in their projects on top of all the functionality Snowflake offers natively. In this blog post we will take you through what Snowflake is, what the integration with FME can do for you and how all of this can be accessed from your favorite notebook or IDE environment.
Snowflake & Data Science
Snowflake is a cloud computing platform that enables organizations to get rid of data silos. It offers a platform that handles any data format, can perform analysis at near-unlimited scale and allows users to easily and securely share data without copying or even moving it. Snowflake integrates with ETL platforms for data ingestion and synchronization, and it can handle streaming data. Users can interact with the data using interactive dashboards built on top of it, or via a host of data science and ML environments.
Snowflake & Geospatial data
The one thing that is still relatively new to Snowflake is the possibility to integrate and analyze spatial data. Although it does contain the essential support for coordinates in WGS84, out of the box it lacks full geometry and geography data support. To solve this issue Snowflake reached out to Safe Software & Tensing to find a solution. At Tensing we have built a PoC solution that integrates FME Server in Snowflake, to provide a snowflake-only experience that harnesses all the (geospatial) analytical power of FME.
Snowflake data science capabilities
Snowflake is an incredibly useful tool for working on data science projects, as it provides you with a single point of access to all the data you need. This includes data from your own organization as well as external data sources through the global network of trusted data. This data can be processed on the multi cluster compute architecture, allowing highly scalable preprocessing and data preparation. As a data scientist you can build your pipelines using your language of choice as well as any machine learning tool or framework using the native connectors.
Jupyter Notebooks, ArcGIS Notebooks and IDEs
Analyzing data and creating machine learning models is an iterative process of exploring, testing and validating methods and their corresponding results. It always starts and ends with discussing your work with domain experts and end users, who will present new ideas or point out flaws in the current approach. Your counterparts in this discussion will not always be able to read your code. Heck, even advanced programmers can have a hard time reading each other’s code. Creating well-structured and documented notebooks while working on data science projects is therefore extremely important. It allows you to share what you’ve done, what the outcomes are and which new insights or new questions this brings up.
Connecting to Snowflake from your notebook environment can be done using the Snowflake connector package in Python. This allows you to run SQL queries and fetch the results, or even to create a SQL alchemy engine that allows you to directly read data from Snowflake into Pandas data frames. Combining this integration with the ArcGIS API for Python or even open source alternatives such as Geopandas creates an incredibly powerful platform for scalable (geo)data science projects.
Using the FME integration in Snowflake you can now perform large scale spatial processing from within your notebook environment. This enables you to exploit the full potential of spatial data in your projects and create better data science solutions.
Do you want to know more about the background and capabilities of Snowflake? Watch the webinar 'Empowering spatial insights with FME in Snowflake', where we take you on an initial exploration of this revolutionary platform and show how we link Snowflake to other data platforms such as Esri ArcGIS.