Experience Snowflake with Legacy Snowflake Notebooks

Snowflake Notebooks is a development environment that you can use with other Snowflake features. This topic describes ways to leverage other Snowflake features within notebooks.

笔记本中的 Snowpark Python

The Snowpark library provides an intuitive API for querying and processing data in a data pipeline. Using the Snowpark library, you can build applications that process data in Snowflake without moving data to the system where your application code runs. You can also automate data transformation and processing by writing stored procedures and scheduling those procedures as tasks in Snowflake.

您可以在笔记本的 Python 单元中编写 Snowpark 代码,从而使用 Snowpark 在 Snowflake 中大规模查询和处理数据。

示例用法

Snowpark Python comes pre-installed in the Snowflake Notebooks environment. The following example uses the Snowpark library in a notebook to read in a CSV file and a Snowflake table and display its contents as output.

  1. In your notebook, add a Python cell, either using a keyboard shortcut or by selecting + Python. Snowflake Notebooks and Snowpark both support Python 3.9.

  2. Set up a Snowpark session. In notebooks, the session context variable is preconfigured. You can use the get_active_session method to get the session context variable:

    from snowflake.snowpark.context import get_active_session
    session = get_active_session()
  3. Use Snowpark to load a CSV file into a Snowpark DataFrame from a stage location. This example uses a stage called tastybyte_stage.

    df = session.read.options({"infer_schema":True}).csv('@TASTYBYTE_STAGE/app_order.csv')
  4. Load an existing Snowflake table, app_order, into the Snowpark DataFrame.

    df = session.table("APP_ORDER")
  5. 显示 Snowpark DataFrame。

    df

Note

Outside of the Snowflake Notebooks environment, you must call df.show() to print out the DataFrame. In Snowflake Notebooks, DataFrames are evaluated eagerly when df is printed out. The DataFrame is printed out as an interactive Streamlit DataFrame display (st.dataframe). DataFrames output is limited to 10,000 rows or 8 MB, whichever is lower.

Snowpark 限制

  • A Snowflake Notebook creates a Snowpark session, so you can use most of the methods available in a Snowpark session class. However, because a notebook runs inside Snowflake rather than in your local development environment, you cannot use the following methods:

    • session.add_import
    • session.add_packages
    • session.add_requirements
  • Some Snowpark Python operations don’t work with SPROCs. For a complete list of operations, see Python stored procedure limitations.

Note

These quickstarts are only shown as examples. Following along with the example may require additional rights to third-party data, products, or services that are not owned or provided by Snowflake. Snowflake does not guarantee the accuracy of these examples or cover them under any Service Level Agreement.

笔记本中的 Streamlit

Streamlit is an open-source Python library that makes it easy to create and share custom web apps for machine learning and data science. You can build interactive data applications with Streamlit directly in your notebook. You can test and develop your app directly in a notebook. Streamlit comes preinstalled in notebooks, so you can start quickly.

示例用法

Streamlit comes pre-installed with the Snowflake Notebooks environment. The example in this section creates an interactive data app using Streamlit.

  1. 导入必要的库

    import streamlit as st
    import pandas as pd
  2. 首先为应用程序创建一些样本数据。

    species = ["setosa"] * 3 + ["versicolor"] * 3 + ["virginica"] * 3
    measurements = ["sepal_length", "sepal_width", "petal_length"] * 3
    values = [5.1, 3.5, 1.4, 6.2, 2.9, 4.3, 7.3, 3.0, 6.3]
    df = pd.DataFrame({"species": species,"measurement": measurements,"value": values})
    df
  3. 从 Streamlit 库中设置交互式滑块。

    st.markdown("""# Interactive Filtering with Streamlit! :balloon:
             Values will automatically cascade down the notebook cells""")
    value = st.slider("Move the slider to change the filter value 👇", df.value.min(), df.value.max(), df.value.mean(), step = 0.3 )
  4. 最后,根据滑块值显示筛选后的表。

    df[df["value"]>value].sort_values("value")

您可以在笔记本上与应用程序进行实时交互。根据您在滑块上设置的值,查看筛选后的表的变化。

Tip

For the complete example, see the interactive data app section of the Visual Data Stories with Snowflake Notebooks (https://github.com/Snowflake-Labs/snowflake-demo-notebooks/blob/main/Visual%20Data%20Stories%20with%20Snowflake%20Notebooks/Visual%20Data%20Stories%20with%20Snowflake%20Notebooks.ipynb) notebook.

笔记本中的 Streamlit 支持

Mapbox and Carto provide map tiles when you use the st.map (https://docs.streamlit.io/develop/api-reference/charts/st.map) or st.pydeck_chart (https://docs.streamlit.io/develop/api-reference/charts/st.pydeck_chart) Streamlit commands.

In warehouse runtimes, which manage their packages with conda, Mapbox and Carto are third-party applications that are subject to Snowflake’s External Offerings Terms.

Important

Mapbox is not available in the People’s Republic of China. Snowflake recommends using Carto instead.

To use these commands in warehouse runtimes, you must acknowledge the External Offerings Terms. Container runtimes don’t require this acknowledgement.

笔记本中不支持以下 Streamlit 元素:

容器运行时笔记本

Container Runtime provides software and hardware options to support advanced data science and machine learning workloads. For details on Container Runtime, see Notebooks on Container Runtime.

笔记本中的 Snowflake ML 注册表

The Snowflake Model Registry allows you to securely manage models and your metadata in Snowflake, regardless of origin. The model registry stores machine learning models as first-class schema-level objects in Snowflake so they can easily be found and used by others in your organization. You can create registries, and store models in them, using classes in the Snowpark ML library. Models can have multiple versions, and you can designate a version as the default.

示例用法

To use the Snowflake ML registry, install the snowflake-ml-python library for your notebook:

  1. From your notebook, select Packages at the top.
  2. Search for the snowflake-ml-python package and select the library to install it.

以下是有关如何使用 Snowflake ML 注册表记录模型的示例:

from snowflake.ml.registry import Registry
# Create a registry and log the model
native_registry = Registry(session=session, database_name=db, schema_name=schema)

# Let's first log the very first model we trained
model_ver = native_registry.log_model(
    model_name=model_name,
    version_name='V0',
    model=regressor,
    sample_input_data=X, # to provide the feature schema
)

# Add evaluation metric
model_ver.set_metric(metric_name="mean_abs_pct_err", value=mape)

# Add a description
model_ver.comment = "This is the first iteration of our Diamonds Price Prediction model. It is used for demo purposes."

# Show Models
native_registry.get_model(model_name).show_versions()

Tip

View this end-to-end example of how to use Snowflake ML Registry.

笔记本中的 Pandas on Snowflake

pandas on Snowflake lets you run your pandas code in a distributed manner directly on your data in Snowflake. Just by changing the import statement and a few lines of code, you can get the same familiar pandas-native experience with the scalability and security benefits of Snowflake.

借助 Pandas on Snowflake,您就可以处理更大的数据集,并避免将 pandas 管道移植到其他大数据框架或配置大型昂贵机器所需的时间和费用。它通过转译到 SQL 在 Snowflake 中本地运行工作负载,使其能够利用并行化优势以及 Snowflake 的数据管理和安全优势。

Pandas on Snowflake 通过 Snowpark Pandas API 交付,是 Snowpark Python 库的一部分,可在 Snowflake 平台内对 Python 代码进行可扩展的数据处理。

示例用法

Snowpark pandas is available in Snowpark Python version 1.17 and later. Snowpark Python comes pre-installed with the Snowflake Notebooks environment.

  1. To install Modin, select modin from Packages and ensure that the version is 0.28.1 or later.
  2. To set the pandas version, select pandas from Packages and ensure that the version is 2.2.1.

在 Python 单元中,导入 Snowpark Python 和 Modin:

import modin.pandas as pd
import snowflake.snowpark.modin.plugin
  1. 创建 Snowpark 会话:

    from snowflake.snowpark.context import get_active_session
    session = get_active_session()
  2. 开始使用 Snowpark Python API:

    # Create a Snowpark Pandas DataFrame with sample data.
    df = pd.DataFrame([[1, 'Big Bear', 8],[2, 'Big Bear', 10],[3, 'Big Bear', None],
                     [1, 'Tahoe', 3],[2, 'Tahoe', None],[3, 'Tahoe', 13],
                     [1, 'Whistler', None],['Friday', 'Whistler', 40],[3, 'Whistler', 25]],
                     columns=["DAY", "LOCATION", "SNOWFALL"])
    # Drop rows with null values.
    df.dropna()
    # Compute the average daily snowfall across locations.
    df.groupby("LOCATION").mean()["SNOWFALL"]

Tip

For more examples of how to use pandas on Snowflake, see Getting Started with pandas on Snowflake (https://quickstarts.snowflake.com/guide/getting_started_with_pandas_on_snowflake/#0).

笔记本中的 Snowflake Python API

The Snowflake Python API is a unified library that seamlessly connects Python with Snowflake workloads. It is intended to provide comprehensive APIs for interacting with Snowflake resources across data engineering, Snowpark, Snowpark ML, and application workloads using a first-class Python API.

You can use the Snowflake Python API to manage Snowflake resources by creating, deleting, or modifying them, and more. You can use Python to perform tasks you might otherwise perform with Snowflake SQL commands.

In Notebooks, the session context variable is preconfigured. You can use the get_active_session method to get the session context variable:

from snowflake.snowpark.context import get_active_session
session = get_active_session()

Create a Root object from which to use the Snowflake Python API:

from snowflake.core import Root
api_root = Root(session)

下面是一个如何使用 Python API 创建数据库和架构的示例:

# Create a database and schema by running the following cell in the notebook:
database_ref = api_root.databases.create(Database(name="demo_database"), mode="orreplace")
schema_ref = database_ref.schemas.create(Schema(name="demo_schema"), mode="orreplace")

Tip

For a more detailed example of how to use Snowflake’s Python API, see the Creating Snowflake object using Python API notebook example (https://github.com/Snowflake-Labs/snowflake-demo-notebooks/blob/main/Creating%20Snowflake%20Object%20using%20Python%20API/Creating%20Snowflake%20Object%20using%20Python%20API.ipynb) on Github.