Setting up data ingestion for your Snowflake Connector for Google Analytics Raw Data

This topic describes how to access Snowflake Connector for Google Analytics Raw Data in your Snowflake account.

Note

一次只能从一个 GCP 项目中引入任何单个属性。目前,为以前配置的属性更改项目需要重新安装连接器。未来将取消此限制。

如果您更改属性的导出设置,然后开始将其导出到其他 GCP 项目,则还应手动移动先前 BigQuery 实例中的数据,并将其合并到新配置的实例中。

使用 Snowsight 设置数据引入

To set up data ingestion using Snowsight, do the following:

  1. Sign in to Snowsight as a user with the ACCOUNTADMIN role.
  2. In the navigation menu, select Catalog » Apps.
  3. Search for the Snowflake Connector for Google Analytics Raw Data, then select the tile for the connector.
  4. In the page for the Snowflake Connector for Google Analytics Raw Data, navigate to the Data Sync section.

此部分将显示所有 Google Analytics 属性的列表。

  1. 选择要引入的属性:

    1. 搜索要引入的属性。
    2. Select the checkbox in the Status column next to the property you want to select.
    3. 对要引入 Snowflake 的每个属性重复这些步骤。
  2. Select the heading of the Status column to see the properties you have currently selected.

  3. Select Start sync to begin ingesting data into your Snowflake account.

所选属性显示在属性列表中。

Data Ingestion status will be displayed in the right top corner of the Manage data synchronization section.

每项属性的数据同步将产生两个负载:

  • 初始加载会引入历史数据。从当前日期开始向后运行,直到获得可获取数据的第一天为止。
  • 当前负载,从当前日期引入数据并向前运行。

如果您只想同步当前数据,则可以通过工作表这样做。

Enabling a property using Snowsight will cause the connector to attempt ingestion for all possible export types. If you want to ingest only specific export types, for example if you only have events_ tables in BigQuery, you can do so by using SQL statements.

Note

一旦启用了 带有 初始负载的属性,就可以禁用初始负载。另一方面,当属性在 没有 初始加载的情况下启用时,之后无法再启用初始加载。

使用 Snowsight 修改数据引入

若要修改要引入的 Google Analytics 表或表的同步计划,请执行以下步骤:

  1. Sign in to Snowsight as a user with the ACCOUNTADMIN role.

  2. In the navigation menu, select Catalog » Apps.

  3. Search for the Snowflake Connector for Google Analytics Raw Data, then select the tile for the connector.

  4. In the page for the Snowflake Connector for Google Analytics Raw Data, navigate to the Data Sync section.

  5. Select Edit properties.

  6. 修改要引入的表:

    1. 搜索要引入的表。
    2. Select the checkbox in the Status column next to the table you want to select or deselect.
  7. Select Update data sync.

使用 SQL 语句设置数据引入

要使用 SQL 语句设置数据引入,请执行以下操作:

Note

要配置这些设置,请使用在用作连接器安装数据库实例的数据库的 PUBLIC 架构中定义的存储过程。

在调用这些存储过程之前,选择该数据库作为会话使用的数据库。

For example, if that database is named snowflake_connector_for_google_analytics_raw_data, run the following command:

USE DATABASE snowflake_connector_for_google_analytics_raw_data;

列出可供引入的属性

要列出给定 GCP 项目中的所有可用属性,请调用以下存储过程:

CALL LIST_GA_PROPERTIES();

结果显示了所有可供授权账户引入的项目和属性。如果未返回任何结果,请检查:

  • 是否已配置从 Google Analytics 到 BigQuery 的数据导出。
  • 导出的数据在 BigQuery 中是否可见。
  • 是否向使用的服务账户/经过身份验证的用户分配了适当的角色。

Please be advised that it can take up to 24 hours between setting up the data export and storing data in BigQuery. This delay can be a cause for the LIST_GA_PROPERTIES procedure producing no results.

Turning the Google Analytics export off does not mean the property is ommited by LIST_GA_PROPERTIES. Even though the export was turned off, data can still persist in BigQuery and can be synchronized by the connector.

准备目标数据库

在启用引入之前,您需要向连接器授予在目标数据库和架构内创建表和视图的权限。

GRANT USAGE ON DATABASE <destination database> TO APPLICATION <application name>;

GRANT USAGE ON SCHEMA <destination database>.<destination schema> TO APPLICATION <application name>;

GRANT CREATE TABLE ON SCHEMA <destination database>.<destination schema> TO APPLICATION <application name>;

GRANT CREATE VIEW ON SCHEMA <destination database>.<destination schema> TO APPLICATION <application name>;

启用或禁用属性的引入

To enable or disable the synchronization of data for a specific property in Google Analytics, call the ENABLE_PROPERTIES stored procedure with the following arguments:

CALL ENABLE_PROPERTIES('<gcp_project>', ['<properties_to_configure>'], <enable_initial_load>, <exclude_nulls>, <disable_auto_reloads>, <disable_present_load>, <enabled_export_types>);

其中:

gcp_project

指定已启用属性的 GCP 项目。

properties_to_configure

使用单引号指定以逗号分隔的 Google Analytics 属性名称列表。

Use the property name without the analytics_ prefix.

enable_initial_load

一个布尔值,表示是启用还是禁用初始数据加载,该加载会引入属性的所有历史数据,然后继续进行当前同步。

This is an optional argument and the default value for it is true.

当一个属性之前被启用时,此标记将被忽略,数据引入将从属性被禁用时停止的点继续。

exclude_nulls

Optional boolean indicating whether to exclude fields containing null values from the ingested data. Setting this parameter to true can improve the data ingestion throughput. Default value is false.

disable_auto_reloads

An optional boolean indicating whether to disable automatic reloads. For more information about automatic reload, see Data ingestion model for the Snowflake Connector for Google Analytics Raw Data. Setting this value to true can reduce credit consumption, but late data won’t be ingested into Snowflake. This property cannot be set to true for the FRESH_DAILY export type. Default value is false.

disable_present_load

An optional boolean indicating whether to disable scheduled ingestion. For more information about scheduling, see Data ingestion model for the Snowflake Connector for Google Analytics Raw Data. When setting this value to true, data will only be ingested into Snowflake if you trigger a reload manually. Manual reloads can still trigger automatic reloads unless you also set disable_auto_reloads to true. The initial load still runs unless you set enable_initial_load to false. The default value of disable_present_load is false.

enabled_export_types

An optional list of export types, which connector will try to ingest data for. Possible values are: DAILY, FRESH_DAILY, INTRADAY, USERS and PSEUDONYMOUS_USERS. By default, all export types, except FRESH_DAILY, will be enabled.

For example, to enable the synchronization of the properties named property1, property2, and property3 in the project gcp_example_project, run the following query:

CALL ENABLE_PROPERTIES(PROJECT_ID => 'gcp_example_project', PROPERTY_IDS => ['property1','property2','property3']);

To enable properties without the initial data loading, use the following query:

CALL ENABLE_PROPERTIES(PROJECT_ID => 'gcp_example_project', PROPERTY_IDS => ['property1','property2','property3'], INITIAL_LOAD => FALSE);

If you only have daily and user data in BigQuery, you can explicitly omit the intraday export by running the following query:

CALL ENABLE_PROPERTIES(PROJECT_ID => 'gcp_example_project', PROPERTY_IDS => ['property1'], ENABLED_EXPORT_TYPES => ['DAILY', 'FRESH_DAILY', 'USERS', 'PSEUDONYMOUS_USERS']);

Snowflake recommends using named arguments so that you can specify the arguments you want and leave the rest unchanged. For example, to enable properties with the initial load and exclude fields containing null values, run the following query:

 CALL ENABLE_PROPERTIES(
    PROJECT_ID => 'gcp_example_project',
    PROPERTY_IDS => ['property1', 'property2', 'property3'],
    INITIAL_LOAD => TRUE,
    EXCLUDE_NULLS => TRUE
);

要阻止引入这些属性,请运行以下命令:

CALL DISABLE_PROPERTIES('gcp_example_project', ['property1','property2','property3']);

禁用该属性会停止其同步。禁用属性同步后,已启动但尚未完成的整个引入将从目标数据库中移除。

The ENABLE_PROPERTIES procedure adds the specified property names to the ENABLED_PROPERTIES view.

初始加载

在启用新属性后,连接器开始并行引入在 BigQuery 中找到的所有历史数据,以便与负责收集新事件的当前同步一起进行。初始加载向后运行,从当前日期开始,直到达到可用数据的第一天。

重新加载已引入的数据

要重新加载已经引入的数据,或者加载完全没有引入的数据(例如,因为您启用了没有初始加载的属性,或者数据在 BigQuery 中缺失而现在可用),您可以调用以下程序之一:

CALL RELOAD_PROPERTY('<property id>');

This procedure triggers reload of all data (DAILY, FRESH_DAILY, INTRADAY, USERS and PSEUDONYMOUS_USERS) of a given property, between the earliest table it can find in BigQuery and the last ingested (or terminally marked as DATA_NOT_FOUND) table date between the connector.

CALL RELOAD_PROPERTY('<property id>', <first date>, <last date>);

Triggers reload of all data (DAILY, FRESH_DAILY, INTRADAY, USERS and PSEUDONYMOUS_USERS) of a given property, between provided dates.

CALL RELOAD_PROPERTY('<property id>', '<export type>', <first date>, <last date>);

Triggers reload of DAILY, FRESH_DAILY, INTRADAY, USERS or PSEUDONYMOUS_USERS data of a given property, between provided dates.

Note

  • 重新加载与主加载并行处理。
  • 您可以根据需要触发任意数量的属性重新加载,前提是日期范围不重叠。
  • 从 BigQuery 下载每个表格后,数据会被交换。
  • 仅当特定日期的 BigQuery 中有数据时,才会重新加载交换数据。

可以通过专用视图观察到持续的重新加载:

SELECT * FROM PUBLIC.ONOGOING_RELOADS;

要取消正在进行的重新加载,请执行以下查询:

CALL CANCEL_RELOAD_PROPERTY('<load id>');