Easy Snowflake-Python Connection using BAG Data with Snowpark

Michael Tuijp

Geospatial Data Scientist

The Snowflake Marketplace BAG dataset is a free dataset that can be accessed with one simple query. Our previous article of Snowflake showed how to use Tableau to connect to the dataset. This time, we show how to use Python and the Snowpark package to access the data from Snowflake, directly in a local Python Jupyter Notebook.

What is Snowpark?

Snowpark is part of Snowflake. At its core, Snowpark provides an API that developers can use to construct DataFrames that are executed on Snowflake's platform. It allows coding in languages other than SQL such as Scala, Java, and Python to take advantage of Snowflake's powerful platform, without having to leave Snowflake. This allows data application developers to perform complex transformations within Snowflake while benefiting from built-in, unlimited scalability, performance, governance, and security features.

Snowflake

Start using the DATA ANALYSIS 

First, you need a (free) Snowflake account. You can sign up via this link. The default role is set to 'public'. All you need to do to process your queries is to create a warehouse for your account. If you go to the 'Worksheets' and click on '+ Worksheet', you can create a new warehouse with the following commands:

CREATE OR REPLACE WAREHOUSE MY_WH WITH WAREHOUSE_SIZE='X-SMALL';
GRANT USAGE ON WAREHOUSE MY_WH TO ROLE PUBLIC

With this, you create a new warehouse (the smallest version) and assign user rights to your public role. After this step, we are ready to install Jupyter Notebook. The instructions for this can be found here. The next step is to install the Snowpark package. You can easily do this with the pip command in the environment you are working in. 

pip install snowflake-snowpark-python

Make sure you install the standard pandas package, because we want to turn our requested table into a pandas data frame.

pip install pandas

Once installed, we can start making our connection.

Import the library

To get started, we import the relevant Snowflake libraries. We do that with this code:

### Import Snowpark for Python
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col
import pandas as pd

Create a session object

After importing the Snowpark package, we create a session object. Here you can enter your accounts, credentials, role, and warehouse to connect to the BAG dataset within the Marketplace. You do that with this function.

Connecting to the bag dataset in the snowflake marketplace

Once a session object is created, we can connect to the BAG dataset. You can consult the full dataset or write a specific query as required (like these examples). Then you convert the consulted tables into Pandas data frames, on which you can perform ordinary data transformations.

After running the code, you should see a Pandas data frame, like the image below.

Pandas dataframe

Congratulations, you Jupyter Notebook! You are now ready to use complex Python data analysis tools for your queried data.

Do you have any questions or comments about this topic? Contact us and we will be happy to help you with this.

Plan consultation

Do you want to know more about this topic?

Schedule an appointment with one of our experts today!