Openflow Connector for Kafka 的性能调整¶
备注
此连接器受 Snowflake 连接器条款 的约束。
本主题为优化 Snowflake Openflow Connector for Kafka 的性能提供指导,以实现最佳吞吐量并尽可能缩短向 Snowflake 引入数据时的延迟。
性能注意事项¶
在配置 Openflow Connector for Kafka 以实现最佳性能时,请考虑以下影响引入吞吐量和延迟的关键因素:
Kafka 配置¶
分区数¶
More partitions allow for higher parallelism but require careful coordination with consumer configuration. Excessive partitions can cause several issues: increased memory usage, slower leader elections during failures, and significant metadata management overhead on brokers.
压缩¶
消息压缩可以减少网络带宽,但会增加 CPU 开销。
Flowfile 优化¶
Flowfile 大小¶
为了获得最佳性能,Flowfile 应介于 1-10 MB 范围内,而不是包含单独的小消息。较大的 Flowfile 通过尽可能减少单个文件操作的数量来减少处理开销并提高吞吐量。默认设置应生成在可接受的大小范围内的 Flowfile。当吞吐量低时,预计会生成较小的 Flowfile。
If you observe small flowfiles with high throughput, contact Snowflake Support for assistance.
网络和基础设施¶
网络延迟¶
Lower latency between Kafka brokers and Openflow improves overall performance. Snowflake recommends deploying Kafka brokers and Openflow in the same CSP region.
节点大小建议¶
The following table provides configuration recommendations based on expected workload characteristics:
节点大小 |
建议场景 |
消息速率容量 |
|---|---|---|
小 (S) |
低到中吞吐量场景 |
Up to 18 MB/s per node |
中 (M) |
中到高吞吐量场景 |
Up to 145 MB/s per node |
大 (L) |
高吞吐量场景 |
Up to 250 MB/s per node |
性能优化最佳实践¶
调整处理器并发任务¶
To optimize processor performance, you can adjust the number of concurrent tasks for both ConsumeKafka and PublishSnowpipeStreaming processors. Concurrent tasks allow processors to run multiple threads simultaneously, improving throughput for high-volume scenarios.
要调整处理器的并发任务,请执行以下任务:
右键点击 Openflow 画布中的处理器。
Select Configure from the context menu.
Navigate to the Scheduling tab.
In the Concurrent tasks field, enter the preferred number of concurrent tasks.
Select Apply to save the configuration.
推荐的并发任务设置¶
The following table provides recommended concurrent task settings for different node sizes:
节点大小 |
ConsumeKafka 任务 |
PublishSnowpipeStreaming Tasks |
|---|---|---|
小 (S) |
1 |
1 |
中 (M) |
4 |
2 |
大 (L) |
8 |
2 |
重要注意事项¶
- 内存使用量
每个并发任务都会使用额外的内存。增加并发任务时监控 JVM 堆使用量。
- Kafka 分区
For ConsumeKafka, the number of concurrent tasks multiplied by the number of runtime nodes should not exceed the number of total Kafka partitions from all topics.
- 谨慎开始
从较低的值开始,然后在监控性能指标时逐渐增加。
Troubleshooting performance issues: Common performance bottlenecks¶
使用者延迟严重或 Snowflake 引入瓶颈¶
如果 Kafka 使用者延迟增加或 Snowflake 引入速度缓慢,则执行以下任务:
验证 Openflow 和 Kafka 代理之间的网络连接和带宽。
Observe if the queue in front of the PublishSnowpipeStreaming processor increases.
考虑使用更大的节点类型。
考虑增加运行时的最大节点数。
内存压力¶
如果遇到与内存相关的问题:
Reduce the batch sizes to lower the memory footprint.
减少 ConsumeKafka 处理器的并发任务数量。
考虑升级到更大的节点类型。
网络延迟问题¶
如果遇到高延迟:
验证 Openflow 和外部系统之间的网络配置。
考虑将 OpenFlow 部署在更靠近 Kafka 集群的位置。
If working with low throughput, consider lowering the Client Lag settings in the PublishSnowpipeStreaming processor and Max Uncommitted Time in the ConsumeKafka processor.