snowflake.snowpark.DataFrameAIFunctions.summarize_agg

DataFrameAIFunctions.summarize_agg(input_column: Union[snowflake.snowpark.column.Column, str], *, output_column: Optional[str] = None) snowflake.snowpark.DataFrame[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.39.0/src/snowflake/snowpark/dataframe_ai_functions.py#L962-L1062)

Summarize a column of text data using AI.

This method aggregates and summarizes text data from multiple rows into a single comprehensive summary. It’s particularly useful for creating summaries from collections of reviews, feedback, transcripts, or other text content.

Parameters:
  • input_column – The column (Column object or column name as string) containing the text data to summarize.

  • output_column – The name of the output column to be appended. If not provided, a column named AI_SUMMARIZE_AGG_OUTPUT is appended.

Returns:

A new DataFrame with a single row containing the summarized text.

Examples:

>>> # Summarize product reviews
>>> df = session.create_dataframe([
...     ["The product quality is excellent and shipping was fast."],
...     ["Great value for money, highly recommend!"],
...     ["Customer service was very helpful and responsive."],
...     ["The packaging could be better, but the product itself is good."],
...     ["Easy to use and works as advertised."],
... ], schema=["review"])
>>> summary_df = df.ai.summarize_agg(
...     input_column="review",
...     output_column="reviews_summary"
... )
>>> summary_df.columns
['REVIEWS_SUMMARY']
>>> summary_df.count()
1
>>> results = summary_df.collect()
>>> len(results[0]["REVIEWS_SUMMARY"]) > 10
True

>>> # Summarize with Column object
>>> from snowflake.snowpark.functions import col
>>> df = session.create_dataframe([
...     ["Meeting started with project updates"],
...     ["Discussed timeline and deliverables"],
...     ["Identified key risks and mitigation strategies"],
...     ["Assigned action items to team members"],
... ], schema=["meeting_notes"])
>>> summary_df = df.ai.summarize_agg(
...     input_column=col("meeting_notes"),
...     output_column="meeting_summary"
... )
>>> summary_df.columns
['MEETING_SUMMARY']
>>> summary_df.count()
1
Copy

Note

  • This is an aggregation function that combines multiple rows into a single summary

  • For best results, provide clear and coherent text in the input column

  • The summary will capture the main themes and important points from all input rows

  • Unlike the agg method which requires a task description, summarize_agg automatically generates a comprehensive summary

This function or method is experimental since 1.39.0.

语言: 中文