2024 Performance Improvements¶
Important
Performance improvements often target specific query patterns or workloads. These improvements might or might not have a material impact on a specific workload.
The following performance improvements were introduced in 2024:
| Released | Description | Impact |
|---|---|---|
| December 2024 | Improved sharing of common or similar parts of a query. | Reduces query execution time for queries with multiple WITH clauses. |
| December 2024 | Improved scaling of document pre-processing and inference in Document AI. | Decreases processing time of documents. |
| November 2024 | Top-K pruning for queries that contain aggregate functions. | Expands top-K pruning to include queries that contain aggregate functions. |
| October 2024 | Improved performance for queries that have equivalent (or similar) subqueries or sub-expressions. | Reduces query execution time by eliminating duplicate parts of a query plan. |
| October 2024 | Improved handling of skew. | Reduces query execution time by automatically detecting and resolving skew in the build side of joins. |
| October 2024 | Search Optimization Update: Support for join queries. (General Availability) | Improves the performance of join queries that have a small number of distinct values on the build side of the join. |
| October 2024 | Improved metadata replication. | Reduces the time spent in the SECONDARY_UPLOADING_INVENTORY, PRIMARY_UPLOADING_METADATA, and SECONDARY_DOWNLOADING_METADATA phases of a replication refresh by optimizing serverless compute allocation. This improvement targets refreshes with larger metadata sizes. |
| September 2024 | Improved cloning operations through parallelization. | Reduces the time it takes to clone objects, especially for databases and schemas with extensive metadata. |
| September 2024 | Improved replication refreshes through parallelization. | Reduces the overall refresh time when replicating large volumes of data. |
| August 2024 | Improved performance for LIMIT queries. | Reduces compilation and execution time for queries that use a LIMIT clause to
return |
| July 2024 | Improved table column synchronization for replication. | Reduces the time spent in the SECONDARY_DOWNLOADING_METADATA phase of a refresh operation. |
| July 2024 | Improved warehouse utilization for queries that scan only a small amount of micro-partitions when compared to the compute resources that are available to the virtual warehouse. | Faster execution for queries with expensive operations when scanning data from a small number of micro-partitions, which is common in BI and dashboard use cases. |
| July 2024 | Improved query processing that:
| Faster execution for some queries with LIMIT clauses and GROUP BY statements. |
| June 2024 | Improved single instruction, multiple data (SIMD) processing. |
|
| May 2024 | Improved efficiency of Automatic Clustering. | Reduces the cost of Automatic Clustering because it works more efficiently. |
| May 2024 | Improved object replication. | Reduces the time spent in the SECONDARY_UPLOADING_INVENTORY and SECONDARY_DOWNLOADING_METADATA phases of a refresh operation by optimizing the synchronization of some objects and the authorization mechanism for replication operations. |
| May 2024 | Reduced the latency for loading most Parquet files by up to 50% when the file format option,
USE_VECTORIZED_SCANNER, is set to | The vectorized scanner is well suited for the columnar format of a Parquet (https://parquet.apache.org/docs/file-format/) file and reduces the ingestion latency by downloading only relevant sections of the Parquet file into memory, such as the subset of selected columns. |
| May 2024 | Improved evaluation of aggregations so they are made at more intermediate join trees. | Reduces query execution time for complex queries with aggregations by reducing the amount of data that needs to be processed at the earliest point possible. |
| May 2024 | Improved query execution times for queries that spend a significant amount of time communicating across virtual warehouse nodes. | Increases throughput between compute resources in a warehouse. Each warehouse is a cluster of compute resources. |
| May 2024 | Improved top-k pruning for LIMIT and ORDER BY queries. | Reduces execution time for top-k queries due to fewer scanned files and file header reads. Expands existing top-k improvements to include STRING/BINARY support in ORDER BY columns. Further increases pruning efficiency by sorting the scan set in order of largest/smallest files with respect to the value domain. |
| May 2024 | Improved join order decisions by calculating selectivity estimates with more granularity. | Reduces compilation time and query execution time by calculating selectivity estimates at the micro-partition level. |
| May 2024 | Faster loading time for Python. | Improves performance for Streamlit in Snowflake apps (including Streamlit apps within a Snowflake Native App), Python worksheets, Python UDFs, and stored procedures in Python. |
| April 2024 | Reduced lock/mutex contention. | Reduces query execution times by improving scan performance in a variety of scenarios such as highly concurrent queries running on a warehouse. |
| April 2024 | Improved broadcast join decisions. | Reduces query execution time and improves memory management by optimizing broadcast joins in scenarios like right-deep join trees. |
| April 2024 | Faster query results in Snowsight. | Reduces the time it takes for query results to appear when run in Snowsight. Improvements are most noticeable for queries that return result sets larger than 10,000 rows. |
| March 2024 | Improved metadata replication. | Reduces the time spent in the PRIMARY_UPLOADING_METADATA, SECONDARY_DOWNLOADING_METADATA, and SECONDARY_UPLOADING_INVENTORY phases for metadata. |
| March 2024 | Improved query performance as a result of more accurately calculating selectivity estimates in order to optimize the order of joins. | Reduces execution time when there are mismatches between partition metadata and actual cardinality from join filters. |
| March 2024 | Improved performance for loading JSON files. | Results in lower ingestion latency of up to 25% for many JSON loading scenarios. |
| February 2024 | Improved object replication. | Reduces the time spent in the PRIMARY_UPLOADING_METADATA, SECONDARY_DOWNLOADING_METADATA, and SECONDARY_UPLOADING_INVENTORY phases of a refresh operation by optimizing portions of the snapshot operation and the way some objects are added to the replication inventory. |
| February 2024 | Support for the | Ability to set the |
| January 2024 | Improved execution time for LIMIT 0 queries. | Reduces execution time for queries that use a count of |
| January 2024 | General Availability of larger warehouses (5X-LARGE and 6X-LARGE) in Microsoft Azure regions, excluding Azure Government regions. | Ability to use larger compute resources for memory-intensive queries compared to smaller warehouses. |