snowflake.snowpark.DataFrameAIFunctions.summarize_agg¶
- DataFrameAIFunctions.summarize_agg(input_column: Union[snowflake.snowpark.column.Column, str], *, output_column: Optional[str] = None) snowflake.snowpark.DataFrame [source] (https://github.com/snowflakedb/snowpark-python/blob/v1.39.0/src/snowflake/snowpark/dataframe_ai_functions.py#L962-L1062)¶
Summarize a column of text data using AI.
This method aggregates and summarizes text data from multiple rows into a single comprehensive summary. It’s particularly useful for creating summaries from collections of reviews, feedback, transcripts, or other text content.
- Parameters:
input_column – The column (Column object or column name as string) containing the text data to summarize.
output_column – The name of the output column to be appended. If not provided, a column named
AI_SUMMARIZE_AGG_OUTPUT
is appended.
- Returns:
A new DataFrame with a single row containing the summarized text.
Examples:
>>> # Summarize product reviews >>> df = session.create_dataframe([ ... ["The product quality is excellent and shipping was fast."], ... ["Great value for money, highly recommend!"], ... ["Customer service was very helpful and responsive."], ... ["The packaging could be better, but the product itself is good."], ... ["Easy to use and works as advertised."], ... ], schema=["review"]) >>> summary_df = df.ai.summarize_agg( ... input_column="review", ... output_column="reviews_summary" ... ) >>> summary_df.columns ['REVIEWS_SUMMARY'] >>> summary_df.count() 1 >>> results = summary_df.collect() >>> len(results[0]["REVIEWS_SUMMARY"]) > 10 True >>> # Summarize with Column object >>> from snowflake.snowpark.functions import col >>> df = session.create_dataframe([ ... ["Meeting started with project updates"], ... ["Discussed timeline and deliverables"], ... ["Identified key risks and mitigation strategies"], ... ["Assigned action items to team members"], ... ], schema=["meeting_notes"]) >>> summary_df = df.ai.summarize_agg( ... input_column=col("meeting_notes"), ... output_column="meeting_summary" ... ) >>> summary_df.columns ['MEETING_SUMMARY'] >>> summary_df.count() 1
Note
This is an aggregation function that combines multiple rows into a single summary
For best results, provide clear and coherent text in the input column
The summary will capture the main themes and important points from all input rows
Unlike the
agg
method which requires a task description,summarize_agg
automatically generates a comprehensive summary
This function or method is experimental since 1.39.0.