You are viewing documentation about an older version (1.16.0). View latest version

snowflake.snowpark.Session.add_import

Session.add_import(path: str, import_path: Optional[str] = None, chunk_size: int = 8192, whole_file_hash: bool = False) None[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.16.0/src/snowflake/snowpark/session.py#L644-L723)

Registers a remote file in stage or a local file as an import of a user-defined function (UDF). The local file can be a compressed file (e.g., zip), a Python file (.py), a directory, or any other file resource. You can also find examples in UDFRegistration.

Parameters:
  • path

    The path of a local file or a remote file in the stage. In each case:

    • if the path points to a local file, this file will be uploaded to the stage where the UDF is registered and Snowflake will import the file when executing that UDF.

    • if the path points to a local directory, the directory will be compressed as a zip file and will be uploaded to the stage where the UDF is registered and Snowflake will import the file when executing that UDF.

    • if the path points to a file in a stage, the file will be included in the imports when executing a UDF.

  • import_path – The relative Python import path for a UDF. If it is not provided or it is None, the UDF will import the package directly without any leading package/module. This argument will become a no-op if the path points to a stage file or a non-Python local file.

  • chunk_size – The number of bytes to hash per chunk of the uploaded files.

  • whole_file_hash – By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.

Example:

>>> from snowflake.snowpark.types import IntegerType
>>> from resources.test_udf_dir.test_udf_file import mod5
>>> session.add_import("tests/resources/test_udf_dir/test_udf_file.py", import_path="resources.test_udf_dir.test_udf_file")
>>> mod5_and_plus1_udf = session.udf.register(
...     lambda x: mod5(x) + 1,
...     return_type=IntegerType(),
...     input_types=[IntegerType()]
... )
>>> session.range(1, 8, 2).select(mod5_and_plus1_udf("id")).to_df("col1").collect()
[Row(COL1=2), Row(COL1=4), Row(COL1=1), Row(COL1=3)]
>>> session.clear_imports()
Copy

Note

1. In favor of the lazy execution, the file will not be uploaded to the stage immediately, and it will be uploaded when a UDF is created.

2. The Snowpark library calculates a sha256 checksum for every file/directory. Each file is uploaded to a subdirectory named after the checksum for the file in the stage. If there is an existing file or directory, the Snowpark library will compare their checksums to determine whether it should be re-uploaded. Therefore, after uploading a local file to the stage, if the user makes some changes to this file and intends to upload it again, just call this function with the file path again, the existing file in the stage will be overwritten by the re-uploaded file.

3. Adding two files with the same file name is not allowed, because UDFs can’t be created with two imports with the same name.

4. This method will register the file for all UDFs created later in the current session. If you only want to import a file for a specific UDF, you can use imports argument in functions.udf() or session.udf.register().

Language: English