snowflake.snowpark.DataFrame.groupBy¶

DataFrame.groupBy(*cols: Union[Column, str, Iterable[Union[Column, str]]]) → RelationalGroupedDataFrame[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.48.0/src/snowflake/snowpark/dataframe.py#L2493-L2564)¶

Groups rows by the columns specified by expressions (similar to GROUP BY in SQL).

This method returns a RelationalGroupedDataFrame that you can use to perform aggregations on each group of data.

Parameters:: *cols – The columns to group by.

Valid inputs are:

Empty input

One or multiple Column object(s) or column name(s) (str)

A list of Column objects or column names (str)

Examples

>>> from snowflake.snowpark.functions import col, lit, sum as sum_, max as max_
>>> df = session.create_dataframe([(1, 1),(1, 2),(2, 1),(2, 2),(3, 1),(3, 2)], schema=["a", "b"])
>>> df.group_by().agg(sum_("b")).collect()
[Row(SUM(B)=9)]
>>> df.group_by("a").agg(sum_("b")).sort("a").collect()
[Row(A=1, SUM(B)=3), Row(A=2, SUM(B)=3), Row(A=3, SUM(B)=3)]
>>> df.group_by("a").agg(sum_("b").alias("sum_b"), max_("b").alias("max_b")).sort("a").collect()
[Row(A=1, SUM_B=3, MAX_B=2), Row(A=2, SUM_B=3, MAX_B=2), Row(A=3, SUM_B=3, MAX_B=2)]
>>> df.group_by(["a", lit("snow")]).agg(sum_("b")).sort("a").collect()
[Row(A=1, LITERAL()='snow', SUM(B)=3), Row(A=2, LITERAL()='snow', SUM(B)=3), Row(A=3, LITERAL()='snow', SUM(B)=3)]
>>> df.group_by("a").agg((col("*"), "count"), max_("b")).sort("a").collect()
[Row(A=1, COUNT(LITERAL())=2, MAX(B)=2), Row(A=2, COUNT(LITERAL())=2, MAX(B)=2), Row(A=3, COUNT(LITERAL())=2, MAX(B)=2)]
>>> df.group_by("a").median("b").sort("a").collect()
[Row(A=1, MEDIAN(B)=Decimal('1.500')), Row(A=2, MEDIAN(B)=Decimal('1.500')), Row(A=3, MEDIAN(B)=Decimal('1.500'))]
>>> df.group_by("a").function("avg")("b").sort("a").collect()
[Row(A=1, AVG(B)=Decimal('1.500000')), Row(A=2, AVG(B)=Decimal('1.500000')), Row(A=3, AVG(B)=Decimal('1.500000'))]