设置 Openflow Connector for Google Drive¶

备注

This connector is subject to the Snowflake Connector Terms.

本主题介绍 Openflow Connector for Google Drive 的设置步骤。

先决条件¶

确保您已查看关于 Openflow Connector for Google Drive。
Ensure that you have 设置 Openflow - BYOC or Set up Openflow - Snowflake Deployments.
If using Openflow - Snowflake Deployments, ensure that you've reviewed configuring required domains and have granted access to the required domains for the Google 云端硬盘 connector.

获取凭据¶

设置该连接器需要特定的权限和账户设置，以使 Snowflake Openflow 处理器能够从 Google 读取数据。这种访问权限的授予在一定程度上需要设置服务账户和密钥，以便 Openflow 能够以该服务账户进行身份验证。有关更多信息，请参阅：

配置对 Google Cloud Search API 的访问权限 (https://developers.google.com/cloud-search/docs/guides/project-setup#create_service_account_credentials)
将全域权限委派给服务账户 (https://developers.google.com/identity/protocols/oauth2/service-account#delegatingauthority)

作为 Google 云端硬盘管理员，请执行以下步骤：

先决条件¶

确保您满足以下要求：

您有拥有超级管理员权限的 Google 用户
您有具有以下角色的 Google Cloud 项目：
- 组织策略管理员
- 组织管理员

启用服务账户密钥创建¶

默认情况下，Google 会禁用服务账户密钥创建。要使 Openflow 使用服务账户 JSON，必须关闭此密钥创建策略。

使用具有组织策略管理员角色的超级管理员账户登录 Google Cloud Console (https://console.cloud.google.com/)。
确保您参与的是与组织相关的项目，而不是组织中的项目。
点击 Organization Policies。
选择 Disable service account key creation 策略。
点击 Manage Policy 并关闭强制执行。
点击 Set Policy。

创建服务账户和密钥¶

打开 Google Cloud Console (https://console.cloud.google.com/)，使用被授予服务账户创建访问权限的用户进行身份验证。
确保您参与组织的项目。
在左侧导航栏的 IAM & Admin 下方，选择 Service Accounts 选项卡。
点击 Create Service Account。
输入服务账户名称，然后点击 Create and Continue。
点击 Done。在列出服务账户的表中，找到 OAuth 2 Client ID 列。复制客户端 ID，因为稍后在下一部分中设置全域委派时需要该 ID。
在新建的服务账户上，点击表下方的菜单，其中列出了该服务账户的服务账户，然后选择 Manage keys。
选择 Add key，然后选择 Create new key。
保留默认选择 JSON 并点击 Create。

密钥将以 .json 文件的形式下载到浏览器的 Downloads 目录。

为服务账户授予所列作用域的全域委派¶

登录您的 Google 管理员账户。
从 Google Apps selector 中选择 Admin。
在左侧导航栏中，依次展开 Security 和 Access 并选择 Data control，然后点击 API Controls。
在 API Controls 屏幕上，选择 Manage domain wild delegation。
点击 Add new。
输入从“Create Service Account and Key”部分获取的 OAuth 2 客户端 ID，以及以下作用域：
- https://www.googleapis.com/auth/drive (https://www.googleapis.com/auth/drive)
- https://www.googleapis.com/auth/drive.metadata.readonly (https://www.googleapis.com/auth/drive.metadata.readonly)
- https://www.googleapis.com/auth/admin.directory.group.member.readonly (https://www.googleapis.com/auth/admin.directory.group.member.readonly)
- https://www.googleapis.com/auth/admin.directory.group.readonly (https://www.googleapis.com/auth/admin.directory.group.readonly)
- https://www.googleapis.com/auth/drive.file (https://www.googleapis.com/auth/drive.file)
- https://www.googleapis.com/auth/drive.metadata (https://www.googleapis.com/auth/drive.metadata)
点击 Authorize。

设置 Snowflake 账户¶

作为 Snowflake 账户管理员，手动或使用下面附带的脚本执行以下任务：

创建新角色或使用现有角色并授予数据库权限。
创建类型为 SERVICE 的新 Snowflake 服务用户。
向该 Snowflake 服务用户授予您在前面步骤中创建的角色。
为第 2 步中创建的 Snowflake SERVICE 用户配置密钥对身份验证。
Snowflake 强烈建议执行此步骤。配置 Openflow 支持的密钥管理器（例如 AWS、Azure 和 Hashicorp），并将公钥和私钥存储在密钥存储库中。

备注

如果您出于任何原因不希望使用密钥管理器，则您有责任根据组织的安全策略保护用于密钥对身份验证的公钥和私钥文件。
1. 配置密钥管理器后，确定如何对其进行身份验证。在 AWS 中，建议您使用与 Openflow 关联的 EC2 实例角色，因为这样就无需保留其他密钥。
2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to Controller Settings » Parameter Provider and then fetch your parameter values.
3. 此时，可以使用关联的参数路径引用所有凭据，无需在 Openflow 中保留敏感值。
如果任何其他 Snowflake 用户需要访问连接器引入的原始文档和表（例如，在 Snowflake 中进行自定义处理），则授予这些用户在步骤 1 中创建的角色。
指定一个仓库供连接器使用。从最小的仓库规模开始，然后根据要复制的表数量和传输的数据量来测试规模。相较于扩大仓库规模，采用多集群仓库通常能更有效地应对海量表数量的扩展需求。

示例设置¶

--The following script assumes you'll need to create all required roles, users, and objects.
--However, you may want to reuse some that are already in existence.

--Create a Snowflake service user to manage the connector
USE ROLE USERADMIN;
CREATE USER <openflow_service_user> TYPE=SERVICE COMMENT='Service user for Openflow automation';

--Create a pair of secure keys (public and private). For more information, see
--key-pair authentication. Store the private key for the user in a file to supply
--to the connector’s configuration. Assign the public key to the Snowflake service user:
ALTER USER <openflow_service_user> SET RSA_PUBLIC_KEY = '<pubkey>';


--Create a role to manage the connector and the associated data and
--grant it to that user
USE ROLE SECURITYADMIN;
CREATE ROLE <openflow_connector_admin_role>;
GRANT ROLE <openflow_connector_admin_role> TO USER <openflow_service_user>;


--The following block is for USE CASE 2 (Cortex connect) ONLY
--Create a role for read access to the cortex search service created by this connector.
--This role should be granted to any role that will use the service
CREATE ROLE <cortex_search_service_read_only_role>;
GRANT ROLE <cortex_search_service_read_only_role> TO ROLE <whatever_roles_will_access_search_service>;

--Create the database the data will be stored in and grant usage to the roles created
USE ROLE ACCOUNTADMIN; --use whatever role you want to own your DB
CREATE DATABASE IF NOT EXISTS <destination_database>;
GRANT USAGE ON DATABASE <destination_database> TO ROLE <openflow_connector_admin_role>;

--Create the schema the data will be stored in and grant the necessary privileges
--on that schema to the connector admin role:
USE DATABASE <destination_database>;
CREATE SCHEMA IF NOT EXISTS <destination_schema>;
GRANT USAGE ON SCHEMA <destination_schema> TO ROLE <openflow_connector_admin_role>;
GRANT CREATE TABLE, CREATE DYNAMIC TABLE, CREATE STAGE, CREATE SEQUENCE, CREATE CORTEX
SEARCH SERVICE ON SCHEMA <destination_schema> TO ROLE <openflow_connector_admin_role>;

--The following block is for CASE 2 (Cortex connect) ONLY
--Grant the Cortex read-only role access to the database and schema
GRANT USAGE ON DATABASE <destination_database> TO ROLE <cortex_search_service_read_only_role>;
GRANT USAGE ON SCHEMA <destination_schema> TO ROLE <cortex_search_service_read_only_role>;

--Create the warehouse this connector will use if it doesn't already exist. Grant the
--appropriate privileges to the connector admin role. Adjust the size according to your needs.
CREATE WAREHOUSE <openflow_warehouse>
WITH
   WAREHOUSE_SIZE = 'MEDIUM'
   AUTO_SUSPEND = 300
   AUTO_RESUME = TRUE;
GRANT USAGE, OPERATE ON WAREHOUSE <openflow_warehouse> TO ROLE <openflow_connector_admin_role>;

Copy

用例 1：仅使用连接器定义引入文件¶

使用连接器定义可执行以下操作：

对引入的文件执行自定义处理
引入 Google 云端硬盘文件和权限并使其保持最新状态

设置连接器¶

作为数据工程师，执行以下任务以安装和配置连接器：

安装连接器¶

Navigate to the Openflow overview page. In the Featured connectors section, select View more connectors.
在 Openflow 连接器页面上，找到连接器并选择 Add to runtime。
In the Select runtime dialog, select your runtime from the Available runtimes drop-down list and click Add.

备注

在安装连接器之前，请确保在 Snowflake 中为连接器创建了数据库和架构，用于存储引入的数据。
使用您的 Snowflake 账户凭据对部署进行身份验证，并在系统提示时选择 Allow，以允许运行时应用程序访问您的 Snowflake 账户。连接器安装过程需要几分钟才能完成。
使用您的 Snowflake 账户凭据进行运行时身份验证。

此时将显示 Openflow 画布，其中添加了连接器进程组。

配置连接器¶

右键点击导入的进程组并选择 Parameters。
根据 Google 云端硬盘源参数、Google 云端硬盘目标参数和 Google 云端硬盘引入参数中的描述，输入所需的参数值。

Google 云端硬盘源参数¶

参数	描述
Google 委派用户	服务账户使用的用户
GCP 服务账户 JSON	从 Google Cloud Console 下载的服务账户 JSON，允许在连接器中访问 Google APIs

Google 云端硬盘目标参数¶

参数	描述	必填
目标数据库	The database where data will be persisted. It must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase.	是
目标架构	The schema where data will be persisted, which must already exist in Snowflake. The name is case-sensitive. For unquoted identifiers, provide the name in uppercase. See the following examples: `CREATE SCHEMA SCHEMA_NAME` 或 `CREATE SCHEMA schema_name`：使用 `SCHEMA_NAME` `CREATE SCHEMA "schema_name"` 或 `CREATE SCHEMA "SCHEMA_NAME"`：分别使用 `schema_name` 或 `SCHEMA_NAME`	是
Snowflake 身份验证策略	使用以下方式时： Snowflake Openflow Deployment or BYOC: Use SNOWFLAKE_SESSION_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured runtime roles to use SNOWFLAKE_SESSION_TOKEN. BYOC: Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.	是
Snowflake 账户标识符	使用以下方式时： Session Token Authentication Strategy: Must be blank. KEY_PAIR: Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted.	是
Snowflake 私钥	使用以下方式时： Session Token Authentication Strategy: Must be blank. KEY_PAIR：必须是用于身份验证的 RSA 私钥。 The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.	否
Snowflake 私钥文件	使用以下方式时：会话令牌身份验证策略：私钥文件必须为空。 KEY_PAIR: Upload the file that contains the RSA private key used for authentication to Snowflake, formatted according to PKCS8 standards and including standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. To upload the private key file, select the Reference asset checkbox.	否
Snowflake 私钥密码	使用以下方式时： Session Token Authentication Strategy: Must be blank. KEY_PAIR: Provide the password associated with the Snowflake private key file.	否
Snowflake 角色	使用以下方式时： Session Token Authentication Strategy: Use your runtime role. You can find your runtime role in the Openflow UI, by navigating to View Details for your Runtime. KEY_PAIR 身份验证策略：使用为您的服务用户配置的有效角色。	是
Snowflake 用户名	使用以下方式时： Session Token Authentication Strategy: Must be blank. KEY_PAIR: Provide the user name used to connect to the Snowflake instance.	是
Snowflake 仓库	Snowflake warehouse used to run queries.	是

Google 云端硬盘引入参数¶

参数	描述
Google 云端硬盘 ID	可供关注内容和更新的 Google 共享云端硬盘
Google 文件夹名称	或者，可以设置 Google 云端硬盘文件夹标识符（人类可读的文件夹名称），据此筛选传入的文件。如果需要所有文件类型，则选择“Set Empty String”。设置后，仅检索提供的文件夹或子文件夹中的文件。如果为空或未设置，则不应用任何文件夹筛选，而是检索云端硬盘下的所有文件。
Google 域名	Google 群组和云端硬盘所在的 Google Workspace 域名。
要引入的文件扩展名	以逗号分隔的列表，用于指定要引入的文件扩展名。如果可能，连接器会尝试先将文件转换为 PDF 格式。尽管如此，系统仍会基于原始文件扩展名执行扩展名检查。如果 Cortex Parse Document 不支持某些指定的文件扩展名，则连接器将忽略这些文件、在事件日志中记录警告消息，并继续处理其他文件。
Snowflake 文件哈希表名	内部表，用于存储文件内容哈希值，以防止在内容未变更时执行更新操作。

右键点击平面并选择 Enable all Controller Services。
右键点击导入的进程组并选择 Start。连接器开始数据引入。

用例 2：使用连接器定义引入文件，并使用 Cortex 执行处理¶

使用预定义的流定义以执行以下操作：

为组织 Google 云端硬盘内的公共文档创建 AI 助手。
使您的 AI 助手遵守组织 Google 云端硬盘中指定的访问控制。

设置连接器¶

作为数据工程师，执行以下任务以安装和配置连接器：

安装连接器¶

Navigate to the Openflow overview page. In the Featured connectors section, select View more connectors.
在 Openflow 连接器页面上，找到连接器并选择 Add to runtime。
In the Select runtime dialog, select your runtime from the Available runtimes drop-down list and click Add.

备注

在安装连接器之前，请确保在 Snowflake 中为连接器创建了数据库和架构，用于存储引入的数据。
使用您的 Snowflake 账户凭据对部署进行身份验证，并在系统提示时选择 Allow，以允许运行时应用程序访问您的 Snowflake 账户。连接器安装过程需要几分钟才能完成。
使用您的 Snowflake 账户凭据进行运行时身份验证。

此时将显示 Openflow 画布，其中添加了连接器进程组。

配置连接器¶

右键点击导入的进程组并选择 Parameters。
根据 Google 云端硬盘 Cortex Connect 源参数、Google 云端硬盘 Cortex Connect 目标参数和 Google 云端硬盘 Cortex Connect 引入参数中的描述，输入所需的参数值。

Google 云端硬盘 Cortex Connect 源参数¶

参数	描述
Google 委派用户	服务账户使用的用户
GCP 服务账户 JSON	从 Google Cloud Console 下载的服务账户 JSON，允许在连接器中访问 Google APIs

Google 云端硬盘 Cortex Connect 目标参数¶

参数	描述
目标数据库	用于永久保存数据的数据库。它必须已经存在于 Snowflake 中
目标架构	用于永久保存数据的架构。它必须已经存在于 Snowflake 中
Snowflake 账户标识符	Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide your Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted.
Snowflake 身份验证策略	使用以下方式时： Snowflake Openflow Deployment or BYOC: Use SNOWFLAKE_SESSION_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured runtime roles to use SNOWFLAKE_SESSION_TOKEN. BYOC: Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.
Snowflake 私钥	Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, provide the RSA private key used for authentication. The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either Snowflake Private Key File or Snowflake Private Key must be defined.
Snowflake 私钥文件	Leave this blank when using Session Token for your Authentication Strategy. When using KEY_PAIR, upload the file that contains the RSA Private Key used for authentication to Snowflake, formatted according to PKCS8 standards and having standard PEM headers and footers. The header line begins with `-----BEGIN PRIVATE`. Select the Reference asset checkbox to upload the private key file.
Snowflake 私钥密码	将会话令牌用于身份验证策略时，请将此字段留空。使用 KEY_PAIR 时，请提供与 Snowflake 私钥文件关联的密码。
Snowflake 角色	将会话令牌用于身份验证策略时，请使用运行时角色。您可以在 Openflow UI 中找到您的运行时角色，方法是转到“View Details for your Runtime”。将密钥对用于身份验证策略时，请使用为服务用户配置的有效角色。
Snowflake 用户名	将会话令牌用于身份验证策略时，请将此字段留空。使用 KEY_PAIR 时，请提供用于连接到 Snowflake 实例的用户名。
Snowflake 仓库	用于运行查询的 Snowflake 仓库

Google 云端硬盘 Cortex Connect 引入参数¶

参数	描述
Google 云端硬盘 ID	可供关注内容和更新的 Google 共享云端硬盘
Google 文件夹名称	或者，可以设置 Google 云端硬盘文件夹标识符（人类可读的文件夹名称），据此筛选传入的文件。如果需要所有文件类型，则选择“Set Empty String”。设置后，仅检索提供的文件夹或子文件夹中的文件。如果为空或未设置，则不应用任何文件夹筛选，而是检索云端硬盘下的所有文件。
Google 域名	Google 群组和云端硬盘所在的 Google Workspace 域名。
OCR 模式	使用 Parsing documents with AI_PARSE_DOCUMENT 函数解析文件时使用的 OCR 模式。该值可以是 `OCR` 或 `LAYOUT`。
要引入的文件扩展名	以逗号分隔的列表，用于指定要引入的文件扩展名。如果可能，连接器会尝试先将文件转换为 PDF 格式。尽管如此，系统仍会基于原始文件扩展名执行扩展名检查。如果 Cortex Parse Document 不支持某些指定的文件扩展名，则连接器将忽略这些文件、在事件日志中记录警告消息，并继续处理其他文件。
Snowflake 文件哈希表名	内部表，用于存储文件内容哈希值，以防止在内容未变更时执行更新操作。
Snowflake Cortex Search 服务用户角色	获分配 Cortex Search 服务使用权限的角色的标识符。

右键点击平面并选择 Enable all Controller Services。
右键点击导入的进程组并选择 Start。连接器开始数据引入。
查询 Cortex Search 服务。

用例 3：自定义连接器定义¶

自定义连接器定义以执行以下操作：

使用 Document AI 处理引入的文件。
对引入的文件执行自定义处理。

设置连接器¶

作为数据工程师，执行以下任务以安装和配置连接器：

安装连接器¶

Navigate to the Openflow overview page. In the Featured connectors section, select View more connectors.
在 Openflow 连接器页面上，找到连接器并选择 Add to runtime。
In the Select runtime dialog, select your runtime from the Available runtimes drop-down list and click Add.

备注

在安装连接器之前，请确保在 Snowflake 中为连接器创建了数据库和架构，用于存储引入的数据。
使用您的 Snowflake 账户凭据对部署进行身份验证，并在系统提示时选择 Allow，以允许运行时应用程序访问您的 Snowflake 账户。连接器安装过程需要几分钟才能完成。
使用您的 Snowflake 账户凭据进行运行时身份验证。

此时将显示 Openflow 画布，其中添加了连接器进程组。

配置连接器¶

自定义连接器定义。
1. 移除以下进程组：
  - 检查内容是否重复
  - Snowflake 暂存区和解析 PDF
  - 更新 Snowflake Cortex
2. 将任何自定义处理附加到 处理 Google 云端硬盘元数据 进程组的输出。每个 FlowFile 代表一项 Google 云端硬盘文件更改。FlowFile 属性可以在 Fetch Google Drive Metadata 文档中查看。
填充进程组参数。遵循用例 1 的进程：仅使用连接器定义引入文件。请注意，修改连接器定义后，可能并不需要所有参数。

运行流¶

运行流。
1. 启动进程组。该流将在 Snowflake 中创建所有必需的对象。
2. 右键点击导入的进程组并选择 Start。
查询 Cortex Search 服务。

查询 Cortex Search 服务¶

您可以使用 Cortex Search 服务构建聊天和搜索应用程序，以便与 Google 云端硬盘中的文档聊天或查询文档。

安装和配置连接器并开始从 Google 云端硬盘引入内容后，您就可以查询 Cortex Search 服务。有关使用 Cortex Search 的更多信息，请参阅查询 Cortex Search 服务。

筛选器响应

要将 Cortex Search 服务的响应限制为 Google 云端硬盘中特定用户可以访问的文档，可以在查询 Cortex Search 时指定一个包含用户 ID 或电子邮件地址的筛选器。例如，filter.@contains.user_ids 或 filter.@contains.user_emails。连接器创建的 Cortex Search 服务名称是 search_service 并位于 Cortex 架构中。

在 SQL 工作表中运行以下 SQL 代码，利用从 Google 云端硬盘引入的文件查询 Cortex Search 服务。

替换以下内容：

application_instance_name：数据库和连接器应用程序实例的名称。
user_emailID：要筛选响应的用户的电子邮件 ID。
your_question：要获取响应的问题。
number_of_results：要在响应中返回的最大结果数。最大值为 1000，默认值为 10。

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
    '<application_instance_name>.cortex.search_service',
      '{
        "query": "<your_question>",
         "columns": ["chunk", "web_url"],
         "filter": {"@contains": {"user_emails": "<user_emailID>"} },
         "limit": <number_of_results>
       }'
   )
)['results'] AS results

Copy

以下是您可以为 columns 输入的值的完整列表：

列名称	类型	描述
`full_name`	字符串	Google 云端硬盘文档根目录中文件的完整路径。示例：`folder_1/folder_2/file_name.pdf`。
`web_url`	字符串	在浏览器中显示原始 Google 云端硬盘文件的 URL。
`last_modified_date_time`	字符串	项目最近一次修改的日期和时间。
`chunk`	字符串	与 Cortex Search 查询匹配的文档中的一段文本。
`user_ids`	数组	可访问文档的 Microsoft 365 用户 IDs ID 的数组。它还包括分配给文档的所有 Microsoft 365 组的用户 IDs。要查找特定用户 ID，请参阅获取用户 (https://learn.microsoft.com/en-us/graph/api/user-get?view=graph-rest-1.0&tabs=http)。
`user_emails`	数组	可访问文档的 Microsoft 365 用户电子邮件 IDs 的数组。它还包括分配给文档的所有 Microsoft 365 组的用户电子邮件 IDs。

示例：向 AI 助手查询人力资源 (HR) 信息

您可以使用 Cortex Search 查询 AI 助手，为员工聊天提供最新版本的 HR 信息，如入职、行为规范、团队流程和组织政策等。使用响应筛选器，您还可以允许 HR 团队成员查询员工合同，同时遵守 Google 云端硬盘中配置的访问控制。

在 SQL 工作表中运行以下内容，利用从 Google 云端硬盘引入的文件查询 Cortex Search 服务。选择数据库为应用程序实例名称，架构为 Cortex。

替换以下内容：

application_instance_name：数据库和连接器应用程序实例的名称。
user_emailID：要筛选响应的用户的电子邮件 ID。

SELECT PARSE_JSON(
     SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
          '<application_instance_name>.cortex.search_service',
          '{
             "query": "What is my vacation carry over policy?",
             "columns": ["chunk", "web_url"],
             "filter": {"@contains": {"user_emails": "<user_emailID>"} },
             "limit": 1
          }'
     )
 )['results'] AS results

Copy

在 :ref:` Python 工作表 <label-snowsight_worksheets_create>` 中运行以下代码，利用从 Google 云端硬盘引入的文件查询 Cortex Search 服务。确保将 snowflake.core 包添加到数据库中。

替换以下内容：

application_instance_name：数据库和连接器应用程序实例的名称。
user_emailID：要筛选响应的用户的电子邮件 ID。

import snowflake.snowpark as snowpark
from snowflake.snowpark import Session
from snowflake.core import Root

def main(session: snowpark.Session):

   root = Root(session)

   # fetch service
   my_service = (root
     .databases["<application_instance_name>"]
     .schemas["cortex"]
     .cortex_search_services["search_service"]
   )

   # query service
   resp = my_service.search(
     query="What is my vacation carry over policy?",
     columns = ["chunk", "web_url"],
     filter = {"@contains": {"user_emails": "<user_emailID>"} },
     limit=1
   )
   return (resp.to_json())

Copy

Execute the following code in a command-line interface to query the Cortex Search service with files ingested from your Google Drive. You will need to authentication through key pair authentication and OAuth to access the Snowflake REST APIs. For more information, see REST API and 使用 Snowflake 对 Snowflake REST APIs 进行身份验证.

替换以下内容：

application_instance_name：数据库和连接器应用程序实例的名称。
account_url：您的 Snowflake 账户 URL。有关查找您的账户 URL 的说明，请参阅查找账户的组织和账户名称。

curl --location "https://<account_url>/api/v2/databases/<application_instance_name>/schemas/cortex/cortex-search-services/search_service" \
     --header 'Content-Type: application/json' \
     --header 'Accept: application/json' \
     --header "Authorization: Bearer <CORTEX_SEARCH_JWT>" \
     --data '{
         "query": "What is my vacation carry over policy?",
         "columns": ["chunk", "web_url"],
         "limit": 1
     }'

Copy

在暂存区中查找文件¶

存储在暂存区中的文件可能有不可读的名称。要查找特定文件，请使用元数据表作为事实来源。这些表包含文件名与其在暂存区中相应文件 IDs 之间的映射。

对于启用了 Cortex 的设置，请使用以下查询来查找文件：

SELECT DISTINCT METADATA:id FROM DOCS_CHUNKS WHERE METADATA:fullName LIKE '%<file_name>%';

Copy

对于非 Cortex 设置，请使用以下查询：

SELECT FILE_ID FROM DOC_METADATA WHERE FILE_NAME = '<file_name>';

Copy

将 :samp:`<file_name> ` 替换为您要查找的文件的名称或部分名称。

暂存区中的文件以这些查询返回的 ID 开头命名。