snowflake.snowpark.Column¶
- class snowflake.snowpark.Column(expr1: Union[str, Expression], expr2: Optional[str] = None)[source] (https://github.com/snowflakedb/snowpark-python/blob/v1.23.0/src/snowflake/snowpark/column.py#L140-L780)¶
- Bases: - object- Represents a column or an expression in a - DataFrame.- To access a Column object that refers a column in a - DataFrame, you can:- Use the column name. 
- Use the - functions.col()function.
- Use the - DataFrame.col()method.
- Use the index operator - []on a dataframe object with a column name.
- Use the dot operator - .on a dataframe object with a column name.
 - >>> from snowflake.snowpark.functions import col >>> df = session.create_dataframe([["John", 1], ["Mike", 11]], schema=["name", "age"]) >>> df.select("name").collect() [Row(NAME='John'), Row(NAME='Mike')] >>> df.select(col("name")).collect() [Row(NAME='John'), Row(NAME='Mike')] >>> df.select(df.col("name")).collect() [Row(NAME='John'), Row(NAME='Mike')] >>> df.select(df["name"]).collect() [Row(NAME='John'), Row(NAME='Mike')] >>> df.select(df.name).collect() [Row(NAME='John'), Row(NAME='Mike')] - Snowflake object identifiers, including column names, may or may not be case sensitive depending on a set of rules. Refer to Snowflake Object Identifier Requirements for details. When you use column names with a DataFrame, you should follow these rules. - The returned column names after a DataFrame is evaluated follow the case-sensitivity rules too. The above - dfwas created with column name “name” while the returned column name after- collect()was called became “NAME”. It’s because the column is regarded as ignore-case so the Snowflake database returns the upper case.- To create a Column object that represents a constant value, use - snowflake.snowpark.functions.lit():- >>> from snowflake.snowpark.functions import lit >>> df.select(col("name"), lit("const value").alias("literal_column")).collect() [Row(NAME='John', LITERAL_COLUMN='const value'), Row(NAME='Mike', LITERAL_COLUMN='const value')] - This class also defines utility functions for constructing expressions with Columns. Column objects can be built with the operators, summarized by operator precedence, in the following table: - Operator - Description - x[index]- Index operator to get an item out of a Snowflake ARRAY or OBJECT - **- Power - -x,- ~x- Unary minus, unary not - *,- /,- %- Multiply, divide, remainder - +,- -- Plus, minus - &- And - |- Or - ==,- !=,- <,- <=,- >,- >=- Equal to, not equal to, less than, less than or equal to, greater than, greater than or equal to - The following examples demonstrate how to use Column objects in expressions: - >>> df = session.create_dataframe([[20, 5], [1, 2]], schema=["a", "b"]) >>> df.filter((col("a") == 20) | (col("b") <= 10)).collect() # use parentheses before and after the | operator. [Row(A=20, B=5), Row(A=1, B=2)] >>> df.filter((df["a"] + df.b) < 10).collect() [Row(A=1, B=2)] >>> df.select((col("b") * 10).alias("c")).collect() [Row(C=50), Row(C=20)] - When you use - |,- &, and- ~as logical operators on columns, you must always enclose column expressions with parentheses as illustrated in the above example, because their order precedence is higher than- ==,- <, etc.- Do not use - and,- or, and- notlogical operators on column objects, for instance,- (df.col1 > 1) and (df.col2 > 2)is wrong. The reason is Python doesn’t have a magic method, or dunder method for them. It will raise an error and tell you to use- |,- &or- ~, for which Python has magic methods. A side effect is- if column:will raise an error because it has a hidden call to- bool(a_column), like using the- andoperator. Use- if a_column is None:instead.- To access elements of a semi-structured Object and Array, use - []on a Column object:- >>> from snowflake.snowpark.types import StringType, IntegerType >>> df_with_semi_data = session.create_dataframe([[{"k1": "v1", "k2": "v2"}, ["a0", 1, "a2"]]], schema=["object_column", "array_column"]) >>> df_with_semi_data.select(df_with_semi_data["object_column"]["k1"].alias("k1_value"), df_with_semi_data["array_column"][0].alias("a0_value"), df_with_semi_data["array_column"][1].alias("a1_value")).collect() [Row(K1_VALUE='"v1"', A0_VALUE='"a0"', A1_VALUE='1')] >>> # The above two returned string columns have JSON literal values because children of semi-structured data are semi-structured. >>> # The next line converts JSON literal to a string >>> df_with_semi_data.select(df_with_semi_data["object_column"]["k1"].cast(StringType()).alias("k1_value"), df_with_semi_data["array_column"][0].cast(StringType()).alias("a0_value"), df_with_semi_data["array_column"][1].cast(IntegerType()).alias("a1_value")).collect() [Row(K1_VALUE='v1', A0_VALUE='a0', A1_VALUE=1)] - This class has methods for the most frequently used column transformations and operators. Module - snowflake.snowpark.functionsdefines many functions to transform columns.- Methods - alias(alias)- Returns a new renamed Column. - as_(alias)- Returns a new renamed Column. - asc()- Returns a Column expression with values sorted in ascending order. - Returns a Column expression with values sorted in ascending order (null values sorted before non-null values). - Returns a Column expression with values sorted in ascending order (null values sorted after non-null values). - astype(to)- Casts the value of the Column to the specified data type. - between(lower_bound, upper_bound)- Between lower bound and upper bound. - bitand(other)- Bitwise and. - bitor(other)- Bitwise or. - bitwiseAnd(other)- Bitwise and. - bitwiseOR(other)- Bitwise or. - bitwiseXOR(other)- Bitwise xor. - bitxor(other)- Bitwise xor. - cast(to)- Casts the value of the Column to the specified data type. - collate(collation_spec)- Returns a copy of the original - Columnwith the specified- collation_specproperty, rather than the original collation specification property.- contains(string)- Returns true if the column contains string for each row. - desc()- Returns a Column expression with values sorted in descending order. - Returns a Column expression with values sorted in descending order (null values sorted before non-null values). - Returns a Column expression with values sorted in descending order (null values sorted after non-null values). - endswith(other)- Returns true if this Column ends with another string. - eqNullSafe(other)- Equal to. - Is NaN. - equal_null(other)- Equal to. - getField(field)- Accesses an element of ARRAY column by ordinal position, or an element of OBJECT column by key. - getItem(field)- Accesses an element of ARRAY column by ordinal position, or an element of OBJECT column by key. - getName()- Returns the column name (if the column has a name). - get_name()- Returns the column name (if the column has a name). - in_(*vals)- Returns a conditional expression that you can pass to the - DataFrame.filter()or where- DataFrame.where()to perform the equivalent of a WHERE .- Is not null. - isNull()- Is null. - Is not null. - is_null()- Is null. - isin(*vals)- Returns a conditional expression that you can pass to the - DataFrame.filter()or where- DataFrame.where()to perform the equivalent of a WHERE .- like(pattern)- Allows case-sensitive matching of strings based on comparison with a pattern. - name(alias)- Returns a new renamed Column. - over([window])- Returns a window frame, based on the specified - WindowSpec.- regexp(pattern[, parameters])- Returns true if this Column matches the specified regular expression. - rlike(pattern[, parameters])- Returns true if this Column matches the specified regular expression. - startswith(other)- Returns true if this Column starts with another string. - substr(start_pos, length)- Returns a substring of this string column. - substring(start_pos, length)- Returns a substring of this string column. - try_cast(to)- Tries to cast the value of the Column to the specified data type. - within_group(*cols)- Returns a Column expression that adds a WITHIN GROUP clause to sort the rows by the specified columns.