Unstructured data types¶
Snowflake offers native support for unstructured data. That includes document, image, audio, and video files stored in internal and external stages.
Data types¶
Snowflake provides the following data type for unstructured data:
The FILE data type, which represents a file stored in an internal or external stage.
FILE data type¶
The FILE data type represents a reference to a file stored in an internal or external stage and it includes the following metadata:
STAGE: the name of the stage on which the file resides.
RELATIVE_PATH: the relative path of the file in a stage.
STAGE_FILE_URL: the stage file URL.
SCOPED_FILE_URL: a scoped file URL.
CONTENT_TYPE: the mime type of the file.
SIZE: the size, in bytes, of the file.
ETAG: a unique hash of the file contents.
LAST_MODIFIED: the last modified date of the file.
Not all the above fields are required. A valid FILE is required to have CONTENT_TYPE, SIZE, ETAG, LAST_MODIFIED and one of the STAGE/RELATIVE_PATH, STAGE_FILE_URL, or SCOPED_FILE_URL.
FILE data type functions¶
Sub-category |
Function |
---|---|
Constructor |
|
Accessor Functions |
|
Utility Functions |
|
Usage notes¶
No consistency guarantees are provided with respect to values of type FILE and the underlying stage files. If the underlying file is deleted or modified, the FILE value will not be updated. If the a FILE value is deleted from a table, the underlying file is not affected.
Permissions on the underlying files are governed by the type of URL that was specified upon the creation of the FILE, i.e. stage file URLs and stage/relative_path give permanent permission to callers that have access to the associated stage. Scoped URLs give temporary user-based access to the underlying files.
Using unstructured data in Snowflake via SQL¶
Create a table with a FILE column.
create table images_table(img file);
Load data from an external stage MY_IMAGES that contains all the image files. MY_IMAGES can be an internal or external stage. Enable directory table support if not already enabled.
ALTER STAGE MY_IMAGES
DIRECTORY=(ENABLE=true);
Load data into the Snowflake table.
insert into images_table select to_file(file_url) from directory(@my_images);
Run SQL statements against images_table.
select fl_get_relative_path(f)
from images_table
where fl_get_last_modified(f) between '2021-01-01' and '2023-01-01';