snowflake.ml.fileset.sfcfs.SFFileSystem

class snowflake.ml.fileset.sfcfs.SFFileSystem(*args, **kwargs)

Bases: AbstractFileSystem

A filesystem that allows user to access Snowflake stages and stage files with valid Snowflake locations.

The file system is is based on fsspec (https://filesystem-spec.readthedocs.io/ (https://filesystem-spec.readthedocs.io/)). It is a file system wrapper built on top of SFStageFileSystem. It takes Snowflake stage file path as the input and supports read operation. A valid Snowflake location will have the form “@{database_name}.{schema_name}.{stage_name}/{path_to_file}”.

Example 1: Create a file system object and do file operation

>>> conn = snowflake.connector.connect(**connection_parameters)
>>> sffs = SFFileSystem(sf_connection=conn)
>>> sffs.ls("@MYDB.public.FOO/nytrain")
['@MYDB.public.FOO/nytrain/data_0_0_0.csv', '@MYDB.public.FOO/nytrain/data_0_0_1.csv']
>>> with sffs.open('@MYDB.public.FOO/nytrain/nytrain/data_0_0_1.csv', mode='rb') as f:
>>>     print(f.readline())
b'2014-02-05 14:35:00.00000054,13,2014-02-05 14:35:00 UTC,-74.00688,40.73049,-74.00563,40.70676,2\n'
--------
Copy
>>> conn = snowflake.connector.connect(**connection_parameters)
>>> sffs = fsspec.filesystem("sfc", sf_connection=conn)
>>> sffs.ls("@MYDB.public.FOO/nytrain")
['@MYDB.public.FOO/nytrain/data_0_0_0.csv', '@MYDB.public.FOO/nytrain/data_0_0_1.csv']
>>> with sffs.open('@MYDB.public.FOO/nytrain/nytrain/data_0_0_1.csv', mode='rb') as f:
>>>     print(f.readline())
b'2014-02-05 14:35:00.00000054,13,2014-02-05 14:35:00 UTC,-74.00688,40.73049,-74.00563,40.70676,2\n'
--------
Copy
>>> conn = snowflake.connector.connect(**connection_parameters)
>>> with fsspec.open("sfc://@MYDB.public.FOO/nytrain/data_0_0_1.csv", mode='rb', sf_connection=conn) as f:
>>>     print(f.readline())
b'2014-02-05 14:35:00.00000054,13,2014-02-05 14:35:00 UTC,-74.00688,40.73049,-74.00563,40.70676,2\n'
Copy

Initialize file system with a Snowflake Python connection.

Parameters:
  • sf_connection – A Snowflake python connection object. Either it or snowpark_session must be non-empty.

  • snowpark_session – A Snowpark session. Either it or sf_connection must be non-empty.

  • kwargs

    Optional. Other parameters that can be passed on to fsspec. Currently supports:

    • skip_instance_cache: Int. Controls reuse of instances.

    • cache_type, cache_options, block_size: Configure file buffering.

    See more information of these options in https://filesystem-spec.readthedocs.io/en/latest/features.html (https://filesystem-spec.readthedocs.io/en/latest/features.html)

Raises:
  • ValueError – An error occurred when not exactly one of sf_connection and snowpark_session is given.

  • SnowflakeMLException – A failure was encountered while recreating the SFFileSystem from a serialized state.

Methods

info(path: str, **kwargs: Any) Dict[str, Any]

Override fsspec info method. Give details of entry at path.

ls(path: str, detail: bool = False, **kwargs: Any) Union[List[str], List[Dict[str, Any]]]

Override fsspec ls method. List single “directory” with or without details.

Parameters:
  • path – Location at which to list files. It should be in the format of “@{database}.{schema}.{stage}/{path}”

  • detail – If True, each list item is a dict of file properties; otherwise, returns list of filenames.

  • kwargs – Additional arguments passed on.

Returns:

A list of filename if detail is false, or a list of dict if detail is true.

Example:

>>> sffs.ls("@MYDB.public.FOO/")
['@MYDB.public.FOO/nytrain']
>>> sffs.ls("@MYDB.public.FOO/nytrain")
['@MYDB.public.FOO/nytrain/data_0_0_0.csv', '@MYDB.public.FOO/nytrain/data_0_0_1.csv']
>>> sffs.ls("@MYDB.public.FOO/nytrain/")
['@MYDB.public.FOO/nytrain/data_0_0_0.csv', '@MYDB.public.FOO/nytrain/data_0_0_1.csv']
Copy
optimize_read(files: Optional[List[str]] = None) None

Prefetch and cache the presigned urls for all the given files to speed up the file opening.

All the files introduced here will have their urls cached. Further open() on any of cached urls will lead to a batch refreshment of the cached urls in the same stage if that url is inactive.

Parameters:

files – A list of file paths that needs their presigned url cached.

Attributes

async_impl = False
blocksize = 4194304
cachable = True
fsid

Persistent filesystem id that can be used to compare filesystems across sessions.

mirror_sync_methods = False
protocol = 'sfc'
root_marker = ''
sep = '/'
transaction

A context within which files are committed together upon exit

Requires the file class to implement .commit() and .discard() for the normal and exception cases.

语言: 中文