故障转移账户对象¶
本主题介绍在不同 区域 用于灾难恢复的多个账户之间对复制的账户对象进行故障转移所需的步骤。
For information about the purpose of the failover mechanism and when to use it, see 业务连续性和灾难恢复简介.
Prerequisites¶
Enable replication in a set of accounts within the same organization, across multiple regions in one cloud service provider or across different cloud service providers.
Create a primary failover group that defines the kinds of objects to replicate, and specifies the target accounts to which to replicate. You can optionally divide the replicated objects across multiple failover groups, for example if some databases should be replicated more frequently than others.
Create at least one secondary failover group (replica) of each primary failover group in one or more secondary accounts.
Refresh (synchronize) each replica with the latest updates to the objects in the failover group. Perform an initial refresh, and set up a schedule to regularly bring the latest changes to each secondary account.
For instructions, see 复制账户对象和数据库.
将目标账户提升为源账户¶
您可以使用 Snowsight 或 SQL 将目标账户提升为源账户(故障转移)。
For more information about the kinds of objects you can specify in a failover group, see 复制组和故障转移组.
Promote a target account to serve as the source account using Snowsight¶
备注
Only account administrators can edit a replication or failover group using Snowsight (see 使用 Snowsight 进行复制配置的限制).
For the most consistent and reliable failover experience, select all the applicable failover groups and connections and promote them all at the same time. We refer to this operation as a bulk failover.
To promote a target account to serve as the source account using Snowsight, follow these steps:
Sign in to Snowsight. Make sure to sign in using the target account.
In the navigation menu, select Admin » Accounts.
Select Replication, then select Initiate failover. Doing so brings up a dialog where you make the remaining choices.
Select any failover groups to promote. After the failover, the objects specified in those failover groups become writable on the newly promoted primary account. Those objects become read-only on the account that formerly was the primary and is now a secondary account.
Select Next.
Select any connections to promote. After the failover, those connections connect to the account that you're promoting to be the new primary account.
Select Next.
Select Fail over in the confirmation window.
If any refresh operations are in progress for the failover groups you selected, you can wait for those refreshes to complete, or choose an alternative approach if your failover is urgent and should take priority.
The default action is to wait for the refreshes to complete. That way, the primary and secondary systems are all in a consistent state when the bulk failover runs. Snowflake uses your currently selected warehouse to poll the status of the ongoing refreshes. If you don't have a selected warehouse, you select one now using the Select warehouse option.
Or, you can proceed with the failover immediately by selecting Show advanced options.
To fail over only the failover groups that aren't currently being refreshed, select Exit with current progress. In that case, you perform additional refreshes later for the groups that were skipped during the bulk failover.
To cancel the refresh operations and continue the failover, select Cancel refreshes and force failover. In that case, you might need to clean up any inconsistencies on the secondary system from the interrupted refreshes.
If the failover operation didn't complete for all failover groups, you can perform another bulk failover. Or you can fail over the remaining failover groups one at a time, using the procedure in Promote a single failover group to serve as the primary using Snowsight.
Promote a single failover group to serve as the primary using Snowsight¶
备注
Only account administrators can edit a replication or failover group using Snowsight (see 使用 Snowsight 进行复制配置的限制).
To promote a single failover group to be the primary using Snowsight, follow these steps:
Sign in to Snowsight. Make sure to sign in using the target account.
In the navigation menu, select Admin » Accounts.
依次选择 Replication、Groups。
找到要提升的故障转移组,并选择位于该行最后一列的 More 菜单 (...)。
选择 Fail over,然后在确认窗口中选择 Fail over。
小技巧
You typically use this procedure if you encounter a problem failing over one group, and you need to retry the failover only for that group. To promote an entire account to be the primary, select multiple failover groups and connections and perform a bulk failover. For more information, see Promote a target account to serve as the source account using Snowsight.
使用 SQL 将目标账户提升为源账户¶
To promote a target account to serve as the source account using SQL, you sign in to the target account and execute the ALTER FAILOVER GROUP ... PRIMARY command.
将辅助故障转移组提升为主要故障转移组¶
备注
本节中的示例必须由具有 FAILOVER 权限的角色执行。
以下示例将当前 myorg 组织中的 myaccount2 提升为源账户。
登录到目标账户
myaccount2。列出账户中的故障转移组:
SHOW FAILOVER GROUPS;
对要提升为主故障转移组的每个辅助故障转移组,执行以下语句:
ALTER FAILOVER GROUP myfg PRIMARY;
备注
在源区域发生部分中断期间,复制服务可能仍可用,并且可能仍刷新目标区域中的辅助故障转移组。
为确保数据完整性,如果正在进行刷新操作,Snowflake 会阻止故障转移。这意味着,如果复制操作正在刷新辅助故障转移组,则无法将其提升为主故障转移组。在这种情况下,ALTER FAILOVER GROUP ...PRIMARY 命令将返回错误。
解决因刷新操作正在进行而引起的故障转移语句失败的问题¶
如果正在对尝试提升的辅助故障转移组执行刷新操作,则故障转移语句将生成以下错误:
Replication group "<GROUP_NAME>" cannot currently be set as primary because it is being
refreshed. Either wait for the refresh to finish or cancel the refresh and try again.
要成功进行故障转移,必须完成以下步骤。
选择并完成以下选项之一:
重要
在 SECONDARY_DOWNLOADING_METADATA 或 SECONDARY_DOWNLOADING_DATA 阶段暂停刷新操作可能会导致目标账户的状态不一致。有关更多信息,请参阅 查看正在进行的刷新操作的当前阶段。
暂停以后的故障转移组刷新操作。如果刷新操作正在进行,则必须等待该操作完成,然后才能进行故障转移:
ALTER FAILOVER GROUP myfg SUSPEND;
暂停以后的刷新操作,并且 取消当前正在进行的计划刷新操作(如果有的话)。
如果正在进行的刷新操作是手动触发的,请参阅 取消未自动安排的正在进行的刷新操作。
ALTER FAILOVER GROUP myfg SUSPEND IMMEDIATE;
备注
在语句返回的时间与取消刷新操作完成的时间之间可能会有稍许延迟。
验证故障转移组
myfg未进行任何刷新操作。以下查询不应返回任何结果:SELECT phase_name, start_time, job_uuid FROM TABLE(INFORMATION_SCHEMA.REPLICATION_GROUP_REFRESH_HISTORY('myfg')) WHERE phase_name <> 'COMPLETED' and phase_name <> 'CANCELED';
要查看故障转移组
myfg的已取消刷新操作,可以执行以下语句:SELECT phase_name, start_time, job_uuid FROM TABLE(INFORMATION_SCHEMA.REPLICATION_GROUP_REFRESH_HISTORY('myfg')) WHERE phase_name = 'CANCELED';
现在,可以将辅助故障转移组
myfg提升为主故障转移组:ALTER FAILOVER GROUP myfg PRIMARY;
恢复目标账户中的计划复制¶
故障转移时,所有辅助故障转移组上的计划刷新都会暂停。ALTER FAILOVER GROUP ...RESUME 必须在具有辅助故障转移组的每个 目标账户 中执行,以恢复自动刷新。
ALTER FAILOVER GROUP myfg RESUME;
查看正在进行的刷新操作的当前阶段¶
在刷新操作的大部分阶段,都可以安全地取消刷新操作。不过,在 SECONDARY_DOWNLOADING_METADATA 或 SECONDARY_DOWNLOADING_DATA 阶段取消刷新操作可能会导致目标账户的状态不一致。 如果刷新操作已启动其中一个阶段,则无论源账户是否可用,都会继续完成该阶段。在进行故障切换之前,允许该阶段完成,可确保副本处于一致的状态。在副本处于一致状态后,可以恢复或重放引入和转换管道,将副本更新为当前状态。
要查看故障转移组正在进行的刷新操作的当前阶段,请使用 Information Schema REPLICATION_GROUP_REFRESH_PROGRESS、REPLICATION_GROUP_REFRESH_PROGRESS_BY_JOB、REPLICATION_GROUP_REFRESH_PROGRESS_ALL 表函数。
例如,要查看故障转移组 myfg 正在进行的刷新操作的当前阶段,请执行以下语句:
SELECT phase_name, start_time, end_time
FROM TABLE(
INFORMATION_SCHEMA.REPLICATION_GROUP_REFRESH_PROGRESS('myfg')
);
有关刷新操作阶段的列表,请参阅该函数的 使用说明。
取消未自动安排的正在进行的刷新操作¶
要取消并非由复制计划自动触发的正在进行的刷新操作,必须使用 SYSTEM$CANCEL_QUERY 函数:
使用以下选项之一查找用于运行刷新操作的查询 ID 或 JOB_UUID:
查找所有正在运行的刷新操作的查询 IDs:
SELECT query_id, query_text FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY()) WHERE query_type = 'REFRESH REPLICATION GROUP' AND execution_status = 'RUNNING' ORDER BY start_time;
使用 QUERY_TEXT 列从列表中确定用于故障转移组刷新操作的 QUERY_ID。
查找特定故障转移组
myfg正在进行的刷新操作的 JOB_UUID:SELECT phase_name, start_time, job_uuid FROM TABLE(INFORMATION_SCHEMA.REPLICATION_GROUP_REFRESH_HISTORY('myfg')) WHERE phase_name <> 'COMPLETED' and phase_name <> 'CANCELED';
使用 SYSTEM$CANCEL_QUERY 函数以及 QUERY_ID 或 JOB_UUID 取消刷新操作:
SELECT SYSTEM$CANCEL_QUERY('<QUERY_ID | JOB_UUID>');
返回以下输出结果:
query [<QUERY_ID>] terminated.取消正在进行的刷新操作后,继续执行 后续步骤。
在新提升的源账户中重新打开 Snowpipe Streaming 的活动通道¶
主数据库中由 Snowpipe Streaming 填充的表会复制 到辅助数据库。故障转移后,为表重新打开活动的 Snowpipe Streaming 通道,并为通道重新插入任何缺失的数据行:
通过调用 openChannel (https://javadoc.io/doc/net.snowflake/snowflake-ingest-sdk/latest/net/snowflake/ingest/streaming/SnowflakeStreamingIngestClient.html) API 可重新打开表的活动通道。
提取偏移令牌:
调用 getLatestCommittedOffsetToken (https://javadoc.io/doc/net.snowflake/snowflake-ingest-sdk/latest/net/snowflake/ingest/streaming/SnowflakeStreamingIngestChannel.html#getLatestCommittedOffsetToken()) API 或
执行 SHOW CHANNELS 命令检索表中活动通道的列表。
从提取的偏移令牌中为通道重新插入数据行。
备注
这些步骤仅适用于支持 Snowflake Ingest SDK 的 Snowpipe Streaming;不适用于支持 Kafka Connector 的 Snowpipe Streaming。故障转移后,按照 以下步骤 重新启动 Kafka Connector。
Snowpipe Streaming 和 Kafka Connector¶
如果使用 Kafka Connector 和 Snowpipe Streaming,请在故障转移后按照以下步骤操作:
更新 Kafka Connector 配置,以指向新提升的源账户。
执行 SHOW CHANNELS 命令,检索活动通道列表和偏移令牌。每个通道都属于 Kafka 主题中的一个分区。
为每个分区(通道)手动重置 Kafka 主题中的偏移。
重新启动 Kafka Connector。
For more information, see: