设置 Openflow Connector for SharePoint¶

备注

This connector is subject to the Snowflake Connector Terms.

本主题介绍 Openflow Connector for SharePoint 的设置步骤。

先决条件¶

确保您已查看 About Openflow Connector for SharePoint。
Ensure that you have 设置 Openflow - BYOC or Set up Openflow - Snowflake Deployments.
If using Openflow - Snowflake Deployments, ensure that you've reviewed configuring required domains and have granted access to the required domains for the SharePoint connector.

Set up access to your SharePoint site¶

As an Azure or Office 365 account administrator, perform the following actions:

#. Ensure that you have a Microsoft Graph (https://learn.microsoft.com/en-us/graph/overview) application registered and that it is configured with the following application permissions (https://learn.microsoft.com/en-us/graph/permissions-overview?tabs=http#application-permissions) based on your requirements:

For Microsoft SharePoint (Cortex Search, document ACLs) and Microsoft SharePoint (Simple Ingest, document ACLs):

Sites.Selected: Limits access to only specified sites.
For more information, see Sites.Selected (https://learn.microsoft.com/en-us/graph/permissions-reference#sitesselected).

GroupMember.Read.All: Used for resolving SharePoint group permissions.
For more information, see GroupMember.Read.All (https://learn.microsoft.com/en-us/graph/permissions-reference#groupmemberreadall).

User.ReadBasic.All: Used for resolving Microsoft 365 user emails.
For more information, see User.ReadBasic.All (https://learn.microsoft.com/en-us/graph/permissions-reference#userreadbasicall).

For Microsoft SharePoint (Cortex Search, no document ACLs) and Microsoft SharePoint (Simple Ingest, no document ACLs):

Sites.Selected: Limits access to only specified sites.
For more information, see Sites.Selected (https://learn.microsoft.com/en-us/graph/permissions-reference#sitesselected).

Grant the fullcontrol role to the application in the selected sites.

This role handles folder access changes during CDC ingestion. Grant it using the Grant-PnPAzureADAppSitePermission (https://github.com/pnp/powershell/blob/dev/documentation/Grant-PnPAzureADAppSitePermission.md) cmdlet, or by calling the GraphAPI permission endpoint (https://learn.microsoft.com/en-us/graph/api/site-post-permissions), e.g. using curl.

For more information, see Roles (https://learn.microsoft.com/en-us/graph/permissions-selected-overview?tabs=http#roles).

备注

If you cannot grant the fullcontrol role, grant the narrower read role to the application instead. However, if access to a folder in the ingested site changes, the connector may enter an irreparable state and will require a full re-ingestion of data. Snowflake recommends granting the fullcontrol role to fully mitigate this issue.
Configure application credentials based on your use case:

For Microsoft SharePoint (Cortex Search, document ACLs) and Microsoft SharePoint (Simple Ingest, document ACLs):
- Add a new certificate or ensure that you have access to the existing certificate file and its private key. For more information, see Option 1: Add a certificate (https://learn.microsoft.com/en-us/graph/auth-register-app-v2#option-1-add-a-certificate).
- Create a new client secret and record the secret's value.
  For more information, see Option 2: Add a client secret (https://learn.microsoft.com/en-us/graph/auth-register-app-v2#option-2-add-a-client-secret).
For Microsoft SharePoint (Cortex Search, no document ACLs) and Microsoft SharePoint (Simple Ingest, no document ACLs):
- Create a new client secret and record the secret's value.
  For more information, see Option 2: Add a client secret (https://learn.microsoft.com/en-us/graph/auth-register-app-v2#option-2-add-a-client-secret).
Record the following information from your Microsoft Graph application:
- The client ID of your application.
  For more information, see Application ID (client ID) (https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application#application-id-client-id).
- The tenant ID of your application.
  For more information, see Find your Microsoft 365 tenant ID (https://learn.microsoft.com/en-us/sharepoint/find-your-office-365-tenant-id).
- The site URL of the Microsoft 365 SharePoint site with the files or folders that you want to ingest into Snowflake; for example, https://yourtenant.sharepoint.com/sites/YourSite.

Set up your Snowflake account¶

作为 Snowflake 账户管理员，手动或使用下面附带的脚本执行以下任务：

创建新角色或使用现有角色并授予数据库权限。
创建类型为 SERVICE 的新 Snowflake 服务用户。
向该 Snowflake 服务用户授予您在前面步骤中创建的角色。
为第 2 步中创建的 Snowflake SERVICE 用户配置密钥对身份验证。
Snowflake 强烈建议执行此步骤。配置 Openflow 支持的密钥管理器（例如 AWS、Azure 和 Hashicorp），并将公钥和私钥存储在密钥存储库中。

备注

如果您出于任何原因不希望使用密钥管理器，则您有责任根据组织的安全策略保护用于密钥对身份验证的公钥和私钥文件。
1. 配置密钥管理器后，确定如何对其进行身份验证。在 AWS 中，建议您使用与 Openflow 关联的 EC2 实例角色，因为这样就无需保留其他密钥。
2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to Controller Settings » Parameter Provider and then fetch your parameter values.
3. 此时，可以使用关联的参数路径引用所有凭据，无需在 Openflow 中保留敏感值。
如果任何其他 Snowflake 用户需要访问连接器引入的原始文档和表（例如，在 Snowflake 中进行自定义处理），则授予这些用户在步骤 1 中创建的角色。
指定一个仓库供连接器使用。从最小的仓库规模开始，然后根据要复制的表数量和传输的数据量来测试规模。相较于扩大仓库规模，采用多集群仓库通常能更有效地应对海量表数量的扩展需求。

示例设置¶

--The following script assumes you'll need to create all required roles, users, and objects.
--However, you may want to reuse some that are already in existence.

--Create a Snowflake service user to manage the connector
USE ROLE USERADMIN;
CREATE USER <openflow_service_user> TYPE=SERVICE COMMENT='Service user for Openflow automation';

--Create a pair of secure keys (public and private). For more information, see
--key-pair authentication. Store the private key for the user in a file to supply
--to the connector’s configuration. Assign the public key to the Snowflake service user:
ALTER USER <openflow_service_user> SET RSA_PUBLIC_KEY = '<pubkey>';


--Create a role to manage the connector and the associated data and
--grant it to that user
USE ROLE SECURITYADMIN;
CREATE ROLE <openflow_connector_admin_role>;
GRANT ROLE <openflow_connector_admin_role> TO USER <openflow_service_user>;


--The following block is for USE CASE 2 (Cortex connect) ONLY
--Create a role for read access to the cortex search service created by this connector.
--This role should be granted to any role that will use the service
CREATE ROLE <cortex_search_service_read_only_role>;
GRANT ROLE <cortex_search_service_read_only_role> TO ROLE <whatever_roles_will_access_search_service>;

--Create the database the data will be stored in and grant usage to the roles created
USE ROLE ACCOUNTADMIN; --use whatever role you want to own your DB
CREATE DATABASE IF NOT EXISTS <destination_database>;
GRANT USAGE ON DATABASE <destination_database> TO ROLE <openflow_connector_admin_role>;

--Create the schema the data will be stored in and grant the necessary privileges
--on that schema to the connector admin role:
USE DATABASE <destination_database>;
CREATE SCHEMA IF NOT EXISTS <destination_schema>;
GRANT USAGE ON SCHEMA <destination_schema> TO ROLE <openflow_connector_admin_role>;
GRANT CREATE TABLE, CREATE DYNAMIC TABLE, CREATE STAGE, CREATE SEQUENCE, CREATE CORTEX
SEARCH SERVICE ON SCHEMA <destination_schema> TO ROLE <openflow_connector_admin_role>;

--The following block is for CASE 2 (Cortex connect) ONLY
--Grant the Cortex read-only role access to the database and schema
GRANT USAGE ON DATABASE <destination_database> TO ROLE <cortex_search_service_read_only_role>;
GRANT USAGE ON SCHEMA <destination_schema> TO ROLE <cortex_search_service_read_only_role>;

--Create the warehouse this connector will use if it doesn't already exist. Grant the
--appropriate privileges to the connector admin role. Adjust the size according to your needs.
CREATE WAREHOUSE <openflow_warehouse>
WITH
   WAREHOUSE_SIZE = 'MEDIUM'
   AUTO_SUSPEND = 300
   AUTO_RESUME = TRUE;
GRANT USAGE, OPERATE ON WAREHOUSE <openflow_warehouse> TO ROLE <openflow_connector_admin_role>;

Use case 1: Ingest files only¶

Use a connector to:

Ingest and continuously update Sharepoint files for custom processing within Snowflake
Optionally ingest file permissions (ACL connectors) to persist access controls downstream

设置连接器¶

作为数据工程师，执行以下任务以配置连接器：

安装连接器¶

备注

There are multiple variants of the SharePoint connector. Choose the variant that best fits your use case as described in Variants of the Openflow Connector for SharePoint.

To install the connector, do the following as a data engineer:

Navigate to the Openflow overview page. In the Featured connectors section, select View more connectors.
在 Openflow 连接器页面上，找到连接器并选择 Add to runtime。
In the Select runtime dialog, select your runtime from the Available runtimes drop-down list and click Add.

备注

在安装连接器之前，请确保在 Snowflake 中为连接器创建了数据库和架构，用于存储引入的数据。
使用您的 Snowflake 账户凭据对部署进行身份验证，并在系统提示时选择 Allow，以允许运行时应用程序访问您的 Snowflake 账户。连接器安装过程需要几分钟才能完成。
使用您的 Snowflake 账户凭据进行运行时身份验证。

此时将显示 Openflow 画布，其中添加了连接器进程组。

配置连接器¶

填充进程组参数
1. 右键点击导入的进程组并选择 Parameters。
2. 根据 Sharepoint 引入参数、Sharepoint 目标参数和 `Sharepoint 源参数`_中的描述，输入所需的参数值。

Sharepoint 源参数¶

For all connectors:


参数	描述
SharePoint 站点 URL	连接器将从中引入内容的 URL 或 SharePoint 站点
SharePoint 客户端 ID	Microsoft Entra 客户端 ID。要了解客户端 ID 以及如何在 Microsoft Entra 中找到它，请参阅 `应用程序 ID（客户端 ID）<https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application#application-id-client-id>`_。
SharePoint 客户端密钥	Microsoft Entra 客户端密钥。要了解客户端密钥以及如何在 Microsoft Entra 中找到它，请参阅证书与密钥 (https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application#certificates--secrets)。
SharePoint 租户 ID	Microsoft Entra 租户 ID。要了解租户 ID 以及如何在 Microsoft Entra 中找到它，请参阅查找您的 Microsoft 365 租户ID (https://learn.microsoft.com/en-us/sharepoint/find-your-office-365-tenant-id)。

For ACL connectors only:


参数	描述
Sharepoint 应用程序私钥	生成的 PEM 格式的应用程序私钥。密钥必须未加密。
Sharepoint 站点域名	同步的 Sharepoint 站点的域名。
Sharepoint 应用程序证书	生成的 PEM 格式的应用程序证书。

Sharepoint 目标参数¶


参数	描述	必填
目标数据库	用于永久保存数据的数据库。它必须已存在于 Snowflake 中。名称区分大小写。对于未加引号的标识符，请以大写形式提供名称。	是
目标架构	将持久保存数据的架构，该架构必须已存在于 Snowflake 中。名称区分大小写。对于未加引号的标识符，请提供大写形式的名称。请参阅以下示例： `CREATE SCHEMA SCHEMA_NAME` 或 `CREATE SCHEMA schema_name`：使用 `SCHEMA_NAME` `CREATE SCHEMA "schema_name"` 或 `CREATE SCHEMA "SCHEMA_NAME"`：分别使用 `schema_name` 或 `SCHEMA_NAME`	是
Snowflake 身份验证策略	使用以下方式时： Snowflake Openflow Deployment or BYOC: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured runtime roles to use SNOWFLAKE_MANAGED_TOKEN. BYOC: Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.	是
Snowflake 账户标识符	使用以下方式时：会话令牌身份验证策略：必须留空。 KEY_PAIR：Snowflake 账户名称格式为 [organization-name]-[account-name]，数据永久保存在其中。	是
Snowflake 私钥	使用以下方式时：会话令牌身份验证策略：必须留空。 KEY_PAIR：必须是用于身份验证的 RSA 私钥。 The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.	否
Snowflake 私钥文件	使用以下方式时：会话令牌身份验证策略：私钥文件必须为空。 KEY_PAIR：上传包含用于向 Snowflake 进行身份验证的 RSA 私钥的文件，该文件应根据 PKCS8 标准格式化，并包含标准的 PEM 页眉和页脚。页眉行以 `-----BEGIN PRIVATE` 开头。要上传私钥文件，请选中 Reference asset 复选框。	否
Snowflake 私钥密码	使用以下方式时：会话令牌身份验证策略：必须留空。 KEY_PAIR: Provide the password associated with the Snowflake private key file.	否
Snowflake 角色	使用以下方式时： Session Token Authentication Strategy: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to View Details for your Runtime. KEY_PAIR 身份验证策略：使用为您的服务用户配置的有效角色。	是
Snowflake 用户名	使用以下方式时：会话令牌身份验证策略：必须留空。 KEY_PAIR：提供用于连接到 Snowflake 实例的用户名。	是
Oversized Value Strategy	Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are: Fail Table (default): The table is marked as permanently failed, and replication stops for that table. Set Null: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.	否
Snowflake 仓库	用于运行查询的 Snowflake 仓库。	是

Sharepoint 引入参数¶

For all connectors:


参数	描述
SharePoint 源文件夹	此文件夹及其所有子文件夹中支持的文件将引入 Snowflake 中。此文件夹路径是 Shared Documents 库的相对路径。
要引入的文件扩展名	以逗号分隔的列表，用于指定要引入的文件扩展名。如果可能，连接器会尝试先将文件转换为 PDF 格式。尽管如此，系统仍会基于原始文件扩展名执行扩展名检查。要了解可转换的格式，请参阅格式选项 (https://learn.microsoft.com/en-us/graph/api/driveitem-get-content-format?view=graph-rest-1.0&tabs=http#format-options)。如果 Cortex Parse Document 不支持某些指定的文件扩展名，则连接器将忽略这些文件、在事件日志中记录警告消息，并继续处理其他文件。
Sharepoint Document 库名称	SharePoint 站点中的一个库，用于从中引入文件。
Snowflake 文件哈希表名	用于存储文件哈希值以确定内容是否已更改的表的名称。通常不应更改此参数。

For ACL connectors only:


参数	描述
Sharepoint 站点组已启用	指定是否启用站点组功能。

运行流。
1. 启动进程组。该流将在 Snowflake 中创建所有必需的对象。
2. 右键点击导入的进程组并选择 Start。

Use case 2: Ingest files and perform processing with Cortex¶

使用预定义的流定义以执行以下操作：

Create AI assistants for documents within your organization's SharePoint site
Enable your AI assistants to adhere to access controls specified in your organization's SharePoint site

设置连接器¶

作为数据工程师，执行以下任务以配置连接器：

安装连接器¶

在 Snowflake 中为连接器创建数据库和架构，以存储引入的数据。向第一步中创建的角色授予所需的数据库权限。用实际值替换角色占位符，然后使用以下 SQL 命令：

CREATE DATABASE DESTINATION_DB;
CREATE SCHEMA DESTINATION_DB.DESTINATION_SCHEMA;
GRANT USAGE ON DATABASE DESTINATION_DB TO ROLE <CONNECTOR_ROLE>;
GRANT USAGE ON SCHEMA DESTINATION_DB.DESTINATION_SCHEMA TO ROLE <CONNECTOR_ROLE>;
GRANT CREATE TABLE ON SCHEMA DESTINATION_DB.DESTINATION_SCHEMA TO ROLE <CONNECTOR_ROLE>;

To install the connector, do the following as a data engineer:

Navigate to the Openflow overview page. In the Featured connectors section, select View more connectors.
在 Openflow 连接器页面上，找到连接器并选择 Add to runtime。
In the Select runtime dialog, select your runtime from the Available runtimes drop-down list and click Add.

备注

在安装连接器之前，请确保在 Snowflake 中为连接器创建了数据库和架构，用于存储引入的数据。
使用您的 Snowflake 账户凭据对部署进行身份验证，并在系统提示时选择 Allow，以允许运行时应用程序访问您的 Snowflake 账户。连接器安装过程需要几分钟才能完成。
使用您的 Snowflake 账户凭据进行运行时身份验证。

此时将显示 Openflow 画布，其中添加了连接器进程组。

配置连接器¶

填充进程组参数
1. 右键点击导入的进程组并选择 Parameters。
2. 根据 Sharepoint 源参数、Sharepoint 目标参数和 Sharepoint 引入参数中的描述，输入所需的参数值。

Sharepoint Cortex Connect 源参数¶

For all connectors:


参数	描述
SharePoint 站点 URL	连接器将从中引入内容的 URL 或 SharePoint 站点
SharePoint 客户端 ID	Microsoft Entra 客户端 ID。要了解客户端 ID 以及如何在 Microsoft Entra 中找到它，请参阅 `应用程序 ID（客户端 ID）<https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application#application-id-client-id>`_。
SharePoint 客户端密钥	Microsoft Entra 客户端密钥。要了解客户端密钥以及如何在 Microsoft Entra 中找到它，请参阅证书与密钥 (https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application#certificates--secrets)。
SharePoint 租户 ID	Microsoft Entra 租户 ID。要了解租户 ID 以及如何在 Microsoft Entra 中找到它，请参阅查找您的 Microsoft 365 租户ID (https://learn.microsoft.com/en-us/sharepoint/find-your-office-365-tenant-id)。

For ACL connectors only:


参数	描述
Sharepoint 应用程序私钥	生成的 PEM 格式的应用程序私钥。密钥必须未加密。
Sharepoint 站点域名	同步的 Sharepoint 站点的域名。
Sharepoint 应用程序证书	生成的 PEM 格式的应用程序证书。

Sharepoint Cortex Connect 目标参数¶


参数	描述	必填
目标数据库	用于永久保存数据的数据库。它必须已存在于 Snowflake 中。名称区分大小写。对于未加引号的标识符，请以大写形式提供名称。	是
目标架构	将持久保存数据的架构，该架构必须已存在于 Snowflake 中。名称区分大小写。对于未加引号的标识符，请提供大写形式的名称。请参阅以下示例： `CREATE SCHEMA SCHEMA_NAME` 或 `CREATE SCHEMA schema_name`：使用 `SCHEMA_NAME` `CREATE SCHEMA "schema_name"` 或 `CREATE SCHEMA "SCHEMA_NAME"`：分别使用 `schema_name` 或 `SCHEMA_NAME`	是
Snowflake 身份验证策略	使用以下方式时： Snowflake Openflow Deployment or BYOC: Use SNOWFLAKE_MANAGED_TOKEN. This token is managed automatically by Snowflake. BYOC deployments must have previously configured runtime roles to use SNOWFLAKE_MANAGED_TOKEN. BYOC: Alternatively BYOC can use KEY_PAIR as the value for authentication strategy.	是
Snowflake 账户标识符	使用以下方式时：会话令牌身份验证策略：必须留空。 KEY_PAIR：Snowflake 账户名称格式为 [organization-name]-[account-name]，数据永久保存在其中。	是
Snowflake 私钥	使用以下方式时：会话令牌身份验证策略：必须留空。 KEY_PAIR：必须是用于身份验证的 RSA 私钥。 The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either a Snowflake Private Key File or a Snowflake Private Key must be defined.	否
Snowflake 私钥文件	使用以下方式时：会话令牌身份验证策略：私钥文件必须为空。 KEY_PAIR：上传包含用于向 Snowflake 进行身份验证的 RSA 私钥的文件，该文件应根据 PKCS8 标准格式化，并包含标准的 PEM 页眉和页脚。页眉行以 `-----BEGIN PRIVATE` 开头。要上传私钥文件，请选中 Reference asset 复选框。	否
Snowflake 私钥密码	使用以下方式时：会话令牌身份验证策略：必须留空。 KEY_PAIR: Provide the password associated with the Snowflake private key file.	否
Snowflake 角色	使用以下方式时： Session Token Authentication Strategy: Use your Snowflake role. You can find your Snowflake role in the Openflow UI, by navigating to View Details for your Runtime. KEY_PAIR 身份验证策略：使用为您的服务用户配置的有效角色。	是
Snowflake 用户名	使用以下方式时：会话令牌身份验证策略：必须留空。 KEY_PAIR：提供用于连接到 Snowflake 实例的用户名。	是
Oversized Value Strategy	Determines how the connector handles values that exceed its internal size limits (16 MB) during replication. Possible values are: Fail Table (default): The table is marked as permanently failed, and replication stops for that table. Set Null: The value is replaced with `NULL` in the destination table. Use this to prevent table failures when it is acceptable to lose data in tables beyond the oversized value.	否
Snowflake 仓库	用于运行查询的 Snowflake 仓库。	是

Sharepoint Cortex Connect 引入参数¶

For all connectors:


参数	描述
SharePoint 源文件夹	此文件夹及其所有子文件夹中支持的文件将引入 Snowflake 中。此文件夹路径是 Shared Documents 库的相对路径。
要引入的文件扩展名	以逗号分隔的列表，用于指定要引入的文件扩展名。如果可能，连接器会尝试先将文件转换为 PDF 格式。尽管如此，系统仍会基于原始文件扩展名执行扩展名检查。要了解可转换的格式，请参阅格式选项 (https://learn.microsoft.com/en-us/graph/api/driveitem-get-content-format?view=graph-rest-1.0&tabs=http#format-options)。如果 Cortex Parse Document 不支持某些指定的文件扩展名，则连接器将忽略这些文件、在事件日志中记录警告消息，并继续处理其他文件。
Sharepoint Document 库名称	SharePoint 站点中的一个库，用于从中引入文件。
Snowflake 文件哈希表名	用于存储文件哈希值以确定内容是否已更改的表的名称。通常不应更改此参数。
OCR 模式	使用 Parsing documents with AI_PARSE_DOCUMENT 函数解析文件时使用的 OCR 模式。该值可以是 `OCR` 或 `LAYOUT`。在 `OCR` 模式下，仅提取原始文本内容，忽略格式和表结构。在 `LAYOUT` 模式下，输出将表结构保留为 Markdown。
Snowflake Cortex Search 服务用户角色	获分配 Cortex Search 服务使用权限的角色的标识符。

For ACL connectors only:


参数	描述
Sharepoint 站点组已启用	指定是否启用站点组功能。

右键点击平面并选择 Enable all Controller Services。
右键点击导入的进程组并选择 Start。连接器开始数据引入。
查询 Cortex Search 服务。

用例 3：自定义连接器定义¶

Customize the connector definition to perform custom processing on ingested files.

设置连接器¶

作为数据工程师，执行以下任务以配置连接器：

安装连接器¶

To install the connector, do the following as a data engineer:

Navigate to the Openflow overview page. In the Featured connectors section, select View more connectors.
在 Openflow 连接器页面上，找到连接器并选择 Add to runtime。
In the Select runtime dialog, select your runtime from the Available runtimes drop-down list and click Add.

备注

在安装连接器之前，请确保在 Snowflake 中为连接器创建了数据库和架构，用于存储引入的数据。
使用您的 Snowflake 账户凭据对部署进行身份验证，并在系统提示时选择 Allow，以允许运行时应用程序访问您的 Snowflake 账户。连接器安装过程需要几分钟才能完成。
使用您的 Snowflake 账户凭据进行运行时身份验证。

此时将显示 Openflow 画布，其中添加了连接器进程组。

配置连接器¶

自定义连接器定义。
1. 移除以下进程组：
  - 检查内容是否重复
  - Snowflake 暂存区和解析 PDF
  - 更新 Snowflake Cortex
  - （可选）处理 Microsoft365 组
2. 将任何自定义处理附加到 Process SharePoint Metadata 进程组的输出。每个流文件代表一项 SharePoint 文件更改。
填充进程组参数。遵循与用例 1 相同的进程。请注意，修改连接器定义后，可能并不需要所有参数。
运行流。
1. 启动进程组。该流将在 Snowflake 中创建所有必需的对象。
2. 右键点击导入的进程组并选择 Start。
查询 Cortex Search 服务。

启用 Sharepoint 站点组¶

适用于站点组的 Microsoft Graph 应用程序¶

In addition to the steps specified in Set up access to your SharePoint site, do the following:

添加 Sites.Selected (https://learn.microsoft.com/en-us/graph/permissions-reference#sitesselected) SharePoint 权限。

备注

您应该会在 Microsoft Graph 和 SharePoint 权限中看到 Sites.Selected。
生成密钥对 (https://learn.microsoft.com/en-us/entra/identity-platform/howto-create-self-signed-certificate)。或者，您可以运行以下命令，使用 openssl 创建自签名证书：
```
openssl req -x509 -nodes -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365
```
备注

上面的命令不对生成的私钥加密。如果您要生成加密密钥，请移除 -nodes 实参。
将证书附加 (https://learn.microsoft.com/en-us/graph/applications-how-to-add-certificate?tabs=http) 到 Microsoft Graph 应用程序。

查询 Cortex Search 服务¶

您可以使用 Cortex Search 服务构建聊天和搜索应用程序，以便与 SharePoint 中的文档聊天或查询文档。

安装和配置连接器并开始从 Sharepoint 引入内容后，您就可以查询 Cortex Search 服务。有关使用 Cortex Search 的更多信息，请参阅查询 Cortex Search 服务。

筛选器响应

要将 Cortex Search 服务的响应限制为 SharePoint 中特定用户可以访问的文档，可以在查询 Cortex Search 时指定一个包含用户 ID 或电子邮件地址的筛选器。例如，filter.@contains.user_ids 或 filter.@contains.user_emails。连接器创建的 Cortex Search 服务名称是 search_service 并位于 Cortex 架构中。

在 SQL 工作表中运行以下 SQL 代码，利用从 SharePoint 站点引入的文件查询 Cortex Search 服务。

替换以下内容：

application_instance_name：数据库和连接器应用程序实例的名称。
user_emailID：要筛选响应的用户的电子邮件 ID。
your_question：要获取响应的问题。
number_of_results：要在响应中返回的最大结果数。最大值为 1000，默认值为 10。

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
    '<application_instance_name>.cortex.search_service',
      '{
        "query": "<your_question>",
         "columns": ["chunk", "web_url"],
         "filter": {"@contains": {"user_emails": "<user_emailID>"} },
         "limit": <number_of_results>
       }'
   )
)['results'] AS results

以下是您可以为 columns 输入的值的完整列表：

For all connectors:


列名称	类型	描述
`full_name`	字符串	Sharepoint 站点文档根目录中文件的完整路径。示例：`folder_1/folder_2/file_name.pdf`。
`web_url`	字符串	在浏览器中显示 Sharepoint 原始文件的 URL。
`last_modified_date_time`	字符串	项目最近一次修改的日期和时间。
`chunk`	字符串	与 Cortex Search 查询匹配的文档中的一段文本。

For ACL connectors only:


列名称	类型	描述
`user_ids`	数组	可访问文档的 Microsoft 365 用户 IDs ID 的数组。它还包括分配给文档的所有 Microsoft 365 组的用户 IDs。要查找特定用户 ID，请参阅获取用户 (https://learn.microsoft.com/en-us/graph/api/user-get?view=graph-rest-1.0&tabs=http)。
`user_emails`	数组	可访问文档的 Microsoft 365 用户电子邮件 IDs 的数组。它还包括分配给文档的所有 Microsoft 365 组的用户电子邮件 IDs。

示例：向 AI 助手查询人力资源 (HR) 信息

您可以使用 Cortex Search 查询 AI 助手，为员工聊天提供最新版本的 HR 信息，如入职、行为规范、团队流程和组织政策等。使用响应筛选器，您还可以允许 HR 团队成员查询员工合同，同时遵守 SharePoint 中配置的访问控制。

在 SQL 工作表中运行以下内容，利用从 SharePoint 引入的文件查询 Cortex Search 服务。选择数据库为应用程序实例名称，架构为 Cortex。

替换以下内容：

application_instance_name：数据库和连接器应用程序实例的名称。
user_emailID：要筛选响应的用户的电子邮件 ID。

SELECT PARSE_JSON(
     SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
          '<application_instance_name>.cortex.search_service',
          '{
             "query": "What is my vacation carry over policy?",
             "columns": ["chunk", "web_url"],
             "filter": {"@contains": {"user_emails": "<user_emailID>"} },
             "limit": 1
          }'
     )
 )['results'] AS results

在 :ref:` Python 工作表 <label-snowsight_worksheets_create>` 中运行以下代码，利用从 SharePoint 引入的文件查询 Cortex Search 服务。确保将 snowflake.core 包添加到数据库中。

替换以下内容：

application_instance_name：数据库和连接器应用程序实例的名称。
user_emailID：要筛选响应的用户的电子邮件 ID。

import snowflake.snowpark as snowpark
from snowflake.snowpark import Session
from snowflake.core import Root

def main(session: snowpark.Session):

   root = Root(session)

   # fetch service
   my_service = (root
     .databases["<application_instance_name>"]
     .schemas["cortex"]
     .cortex_search_services["search_service"]
   )

   # query service
   resp = my_service.search(
     query="What is my vacation carry over policy?",
     columns = ["chunk", "web_url"],
     filter = {"@contains": {"user_emails": "<user_emailID>"} },
     limit=1
   )
   return (resp.to_json())

Execute the following code in a command-line interface to query the Cortex Search service with files ingested from your SharePoint. You will need to authentication through key pair authentication and OAuth to access the Snowflake REST APIs. For more information, see REST API and 使用 Snowflake 对 Snowflake REST APIs 进行身份验证.

替换以下内容：

application_instance_name：数据库和连接器应用程序实例的名称。
account_url：您的 Snowflake 账户 URL。有关查找您的账户 URL 的说明，请参阅查找账户的组织和账户名称。

curl --location "https://<account_url>/api/v2/databases/<application_instance_name>/schemas/cortex/cortex-search-services/search_service" \
     --header 'Content-Type: application/json' \
     --header 'Accept: application/json' \
     --header "Authorization: Bearer <CORTEX_SEARCH_JWT>" \
     --data '{
         "query": "What is my vacation carry over policy?",
         "columns": ["chunk", "web_url"],
         "limit": 1
     }'

示例响应：

{
  "results" : [ {
  "web_url" : "https://<domain>.sharepoint.com/sites/<site_name>/<path_to_file>",
  "chunk" : "Answer to the question asked."
  } ]
}

在暂存区中查找文件¶

存储在暂存区中的文件可能有不可读的名称。要查找特定文件，请使用元数据表作为事实来源。这些表包含文件名与其在暂存区中相应文件 IDs 之间的映射。

对于启用了 Cortex 的设置，请使用以下查询来查找文件：

SELECT DISTINCT METADATA:id FROM DOCS_CHUNKS WHERE METADATA:fullName LIKE '%<file_name>%';

对于非 Cortex 设置，请使用以下查询：

SELECT FILE_ID FROM DOC_METADATA WHERE FILE_NAME = '<file_name>';

将 :samp:`<file_name> ` 替换为您要查找的文件的名称或部分名称。

暂存区中的文件以这些查询返回的 ID 开头命名。