| agg(Column, Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#agg(expr:org.apache.spark.sql.Column,exprs:org.apache.spark.sql.Column*):org.apache.spark.sql.DataFrame) | Aggregates on the entire Dataset without groups. Shorthand for groupBy().agg(). |
| agg(Map[String, String]) | (Scala-specific) Aggregates on the entire Dataset without groups, using a map of column names to aggregate functions. |
| agg(java.util.Map[String, String]) | (Java-specific) Aggregates on the entire Dataset without groups, using a map of column names to aggregate functions. |
| agg((String, String), (String, String)*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#agg(aggExpr:(String,String),aggExprs:(String,String)*):org.apache.spark.sql.DataFrame) | (Scala-specific) Aggregates on the entire Dataset without groups, using pairs of column names and aggregate function names. |
| alias(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#alias(alias:String):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset with an alias set. Same as as. |
| alias(Symbol) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#alias(alias:Symbol):org.apache.spark.sql.Dataset[T]) | (Scala-specific) Returns a new Dataset with an alias set. |
| apply(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#apply(colName:String):org.apache.spark.sql.Column) | Selects column based on the column name and returns it as a Column. |
| as(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#as(alias:String):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset with an alias set. |
| as(Symbol) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#as(alias:Symbol):org.apache.spark.sql.Dataset[T]) | (Scala-specific) Returns a new Dataset with an alias set. |
| as[U](Encoder[U]) | Returns a new Dataset where each record has been mapped on to the specified type U. |
| cache() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#cache():Dataset.this.type) | Persists this Dataset with the default storage level (MEMORY_AND_DISK). |
| coalesce(Int) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#coalesce(numPartitions:Int):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset that has exactly numPartitions partitions, when fewer partitions are requested. |
| col(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#col(colName:String):org.apache.spark.sql.Column) | Selects column based on the column name and returns it as a Column. |
| colRegex(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#colRegex(colName:String):org.apache.spark.sql.Column) | Selects column based on the column name specified as a regex and returns it as a Column. |
| collect() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#collect():Array[T]) | Returns an Array[T] that contains all rows in this Dataset. |
| collectAsList() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#collectAsList():java.util.List[T]) | Returns a java.util.List[T] that contains all rows in this Dataset. |
| count() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#count():Long) | Returns the number of rows in the Dataset as a Long. |
| createGlobalTempView(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#createGlobalTempView(viewName:String):Unit) | Creates a global temporary view using the given name. |
| createOrReplaceGlobalTempView(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#createOrReplaceGlobalTempView(viewName:String):Unit) | Creates or replaces a global temporary view using the given name. |
| createOrReplaceTempView(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#createOrReplaceTempView(viewName:String):Unit) | Creates a local temporary view using the given name. |
| createTempView(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#createTempView(viewName:String):Unit) | Creates a local temporary view using the given name. |
| crossJoin(Dataset[_]) | Explicit cartesian join with another DataFrame. |
| cube(String, String*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#cube(col1:String,cols:String*):org.apache.spark.sql.RelationalGroupedDataset) | Creates a multi-dimensional cube for the current Dataset using column names for running aggregations. |
| cube(Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#cube(cols:org.apache.spark.sql.Column*):org.apache.spark.sql.RelationalGroupedDataset) | Creates a multi-dimensional cube for the current Dataset using Column expressions for running aggregations. |
| describe(String*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#describe(cols:String*):org.apache.spark.sql.DataFrame) | Computes basic statistics for numeric and string columns, including count, mean, stddev, min, and max. |
| distinct() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#distinct():org.apache.spark.sql.Dataset[T]) | Returns a new Dataset that contains only the unique rows. This is an alias for dropDuplicates. |
| drop(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#drop(colName:String):org.apache.spark.sql.DataFrame) | Returns a new Dataset with a column dropped by name. This is a no-op if the schema doesn’t contain the column name. |
| drop(String*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#drop(colNames:String*):org.apache.spark.sql.DataFrame) | Returns a new Dataset with multiple columns dropped by name. |
| drop(Column) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#drop(col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame) | Returns a new Dataset with a column dropped. Accepts a Column rather than a name. |
| drop(Column, Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#drop(col:org.apache.spark.sql.Column,cols:org.apache.spark.sql.Column*):org.apache.spark.sql.DataFrame) | Returns a new Dataset with multiple columns dropped using Column expressions. |
| dropDuplicates() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#dropDuplicates():org.apache.spark.sql.Dataset[T]) | Returns a new Dataset with duplicate rows removed. |
| dropDuplicates(Seq[String]) | (Scala-specific) Returns a new Dataset with duplicate rows removed, considering only the subset of columns. |
| dropDuplicates(Array[String]) | (Java-specific) Returns a new Dataset with duplicate rows removed, considering only the subset of columns. |
| dropDuplicates(String, String*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#dropDuplicates(col1:String,cols:String*):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset with duplicate rows removed, considering only the subset of columns. |
| except(Dataset[T]) | Returns a new Dataset containing rows in this Dataset but not in another Dataset. Equivalent to EXCEPT DISTINCT in SQL. |
| exceptAll(Dataset[T]) | Returns a new Dataset containing rows in this Dataset but not in another Dataset while preserving duplicates. Equivalent to EXCEPT ALL in SQL. |
| explain() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#explain():Unit) | Prints the physical plan to the console for debugging purposes. |
| explain(Boolean) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#explain(extended:Boolean):Unit) | Prints the plans (logical and physical) to the console for debugging purposes. |
| explain(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#explain(mode:String):Unit) | Prints the plans with a format specified by a given explain mode (simple, extended, codegen, cost, formatted). |
| filter(Column) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#filter(condition:org.apache.spark.sql.Column):org.apache.spark.sql.Dataset[T]) | Filters rows using the given Column condition. |
| filter(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#filter(conditionExpr:String):org.apache.spark.sql.Dataset[T]) | Filters rows using the given SQL expression string. |
| filter(FilterFunction[T]) | (Java-specific) Returns a new Dataset that only contains elements where func returns true. |
| filter(T => Boolean) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#filter(func:T=%3EBoolean):org.apache.spark.sql.Dataset[T]) | (Scala-specific) Returns a new Dataset that only contains elements where func returns true. |
| first() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#first():T) | Returns the first row. Alias for head(). |
| flatMap[U](FlatMapFunction[T, U], Encoder[U]) | (Java-specific) Returns a new Dataset by first applying a function to all elements and then flattening the results. |
| flatMap[U](T => TraversableOnce[U])(Encoder[U]) | (Scala-specific) Returns a new Dataset by first applying a function to all elements and then flattening the results. |
| foreach(ForeachFunction[T]) | (Java-specific) Runs func on each element of this Dataset. |
| foreach(T => Unit) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#foreach(f:T=%3EUnit):Unit) | (Scala-specific) Applies a function to all rows. |
| foreachPartition(ForeachPartitionFunction[T]) | (Java-specific) Runs func on each partition of this Dataset. |
| foreachPartition(Iterator[T] => Unit) | (Scala-specific) Applies a function to each partition of this Dataset. |
| groupBy(Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#groupBy(cols:org.apache.spark.sql.Column*):org.apache.spark.sql.RelationalGroupedDataset) | Groups the Dataset using the specified Column expressions for running aggregations. |
| groupBy(String, String*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#groupBy(col1:String,cols:String*):org.apache.spark.sql.RelationalGroupedDataset) | Groups the Dataset using the specified column names for running aggregations. |
| groupByKey[K](MapFunction[T, K], Encoder[K]) | (Java-specific) Returns a KeyValueGroupedDataset where the data is grouped by the given key function. |
| groupByKey[K](T => K)(Encoder[K]) | (Scala-specific) Returns a KeyValueGroupedDataset where the data is grouped by the given key function. |
| head() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#head():T) | Returns the first row. |
| head(Int) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#head(n:Int):Array[T]) | Returns the first n rows as an Array[T]. |
| hint(String, Any*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#hint(name:String,parameters:Any*):org.apache.spark.sql.Dataset[T]) | Specifies some hint on the current Dataset (for example, broadcast hint for joins). |
| intersect(Dataset[T]) | Returns a new Dataset containing rows only in both this Dataset and another Dataset. Equivalent to INTERSECT in SQL. |
| intersectAll(Dataset[T]) | Returns a new Dataset containing rows only in both Datasets while preserving duplicates. Equivalent to INTERSECT ALL in SQL. |
| join(Dataset[_]) | Joins with another DataFrame. Behaves as an inner join and requires a subsequent join predicate. |
| join(Dataset[_], String) | Inner equi-join with another DataFrame using the given column name. |
| join(Dataset[_], Seq[String]) | (Scala-specific) Inner equi-join with another DataFrame using the given column names. |
| join(Dataset[_], Array[String]) | (Java-specific) Inner equi-join with another DataFrame using the given column names. |
| join(Dataset[_], String, String) | Equi-join with another DataFrame using the given column name and join type. |
| join(Dataset[_], Seq[String], String) | (Scala-specific) Equi-join with another DataFrame using the given column names and join type. |
| join(Dataset[_], Array[String], String) | (Java-specific) Equi-join with another DataFrame using the given column names and join type. |
| join(Dataset[_], Column) | Inner join with another DataFrame using the given join expression. |
| join(Dataset[_], Column, String) | Joins with another DataFrame using the given join expression and join type. |
| joinWith[U](Dataset[U], Column) | Inner equi-join to join this Dataset, returning a Dataset[(T, U)] for each pair where the condition evaluates to true. |
| joinWith[U](Dataset[U], Column, String) | Joins this Dataset returning a Dataset[(T, U)] for each pair where the condition evaluates to true, using the specified join type. |
| limit(Int) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#limit(n:Int):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset by taking the first n rows. |
| map[U](MapFunction[T, U], Encoder[U]) | (Java-specific) Returns a new Dataset that contains the result of applying func to each element. |
| map[U](T => U)(Encoder[U]) | (Scala-specific) Returns a new Dataset that contains the result of applying func to each element. |
| mapPartitions[U](MapPartitionsFunction[T, U], Encoder[U]) | (Java-specific) Returns a new Dataset that contains the result of applying func to each partition. |
| mapPartitions[U](Iterator[T] => Iterator[U])(Encoder[U]) | (Scala-specific) Returns a new Dataset that contains the result of applying func to each partition. |
| melt(Array[Column], Array[Column], String, String) | Unpivots a DataFrame from wide format to long format. This is an alias for unpivot. |
| melt(Array[Column], String, String) | Unpivots a DataFrame from wide format to long format, where values are set to all non-id columns. |
| metadataColumn(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#metadataColumn(colName:String):org.apache.spark.sql.Column) | Selects a metadata column based on its logical column name and returns it as a Column. |
| observe(String, Column, Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#observe(name:String,expr:org.apache.spark.sql.Column,exprs:org.apache.spark.sql.Column*):org.apache.spark.sql.Dataset[T]) | Defines named metrics to observe on the Dataset. |
| observe(Observation, Column, Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#observe(observation:org.apache.spark.sql.Observation,expr:org.apache.spark.sql.Column,exprs:org.apache.spark.sql.Column*):org.apache.spark.sql.Dataset[T]) | Observes named metrics through an Observation instance. |
| offset(Int) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#offset(n:Int):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset by skipping the first n rows. |
| orderBy(Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#orderBy(sortExprs:org.apache.spark.sql.Column*):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset sorted by the given Column expressions. This is an alias for sort. |
| orderBy(String, String*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#orderBy(sortCol:String,sortCols:String*):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset sorted by the given column names. |
| persist() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#persist():Dataset.this.type) | Persists this Dataset with the default storage level (MEMORY_AND_DISK). |
| persist(StorageLevel) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#persist(newLevel:org.apache.spark.storage.StorageLevel):Dataset.this.type) | Persists this Dataset with the given StorageLevel. |
| printSchema() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#printSchema():Unit) | Prints the schema to the console in a nice tree format. |
| printSchema(Int) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#printSchema(level:Int):Unit) | Prints the schema up to the given level to the console in a nice tree format. |
| repartition(Int) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#repartition(numPartitions:Int):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset that has exactly numPartitions partitions. |
| repartition(Int, Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#repartition(numPartitions:Int,partitionExprs:org.apache.spark.sql.Column*):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset hash-partitioned by the given Column expressions into numPartitions. |
| repartition(Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#repartition(partitionExprs:org.apache.spark.sql.Column*):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset hash-partitioned by the given partitioning Column expressions. |
| repartitionByRange(Int, Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#repartitionByRange(numPartitions:Int,partitionExprs:org.apache.spark.sql.Column*):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset range-partitioned by the given Column expressions into numPartitions. |
| repartitionByRange(Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#repartitionByRange(partitionExprs:org.apache.spark.sql.Column*):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset range-partitioned by the given partitioning Column expressions. |
| rollup(String, String*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#rollup(col1:String,cols:String*):org.apache.spark.sql.RelationalGroupedDataset) | Creates a multi-dimensional rollup for the current Dataset using column names for running aggregations. |
| rollup(Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#rollup(cols:org.apache.spark.sql.Column*):org.apache.spark.sql.RelationalGroupedDataset) | Creates a multi-dimensional rollup for the current Dataset using Column expressions for running aggregations. |
| sameSemantics(Dataset[T]) | Returns true when the logical query plans inside both Datasets are equal and therefore return same results. |
| sample(Double) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#sample(fraction:Double):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset by sampling a fraction of rows (without replacement). |
| sample(Double, Long) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#sample(fraction:Double,seed:Long):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset by sampling a fraction of rows (without replacement), using a user-supplied seed. |
| sample(Boolean, Double) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#sample(withReplacement:Boolean,fraction:Double):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset by sampling a fraction of rows, using a random seed. |
| sample(Boolean, Double, Long) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#sample(withReplacement:Boolean,fraction:Double,seed:Long):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset by sampling a fraction of rows, using a user-supplied seed. |
| select(Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#select(cols:org.apache.spark.sql.Column*):org.apache.spark.sql.DataFrame) | Selects a set of column-based expressions. |
| select(String, String*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#select(col:String,cols:String*):org.apache.spark.sql.DataFrame) | Selects a set of columns by name. |
| select[U1](TypedColumn[T, U1]) | Returns a new Dataset by computing the given TypedColumn expression for each element. |
| selectExpr(String*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#selectExpr(exprs:String*):org.apache.spark.sql.DataFrame) | Selects a set of SQL expressions. This is a variant of select that accepts SQL expression strings. |
| semanticHash() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#semanticHash():Int) | Returns a hashCode of the logical query plan against this Dataset. |
| show() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#show():Unit) | Displays the top 20 rows of the Dataset in a tabular form. |
| show(Int) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#show(numRows:Int):Unit) | Displays the Dataset in a tabular form, showing numRows rows. |
| show(Boolean) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#show(truncate:Boolean):Unit) | Displays the top 20 rows with truncation control. |
| show(Int, Boolean) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#show(numRows:Int,truncate:Boolean):Unit) | Displays the Dataset in a tabular form with truncation control. |
| show(Int, Int) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#show(numRows:Int,truncate:Int):Unit) | Displays the Dataset in a tabular form with truncation to a specific character count. |
| show(Int, Int, Boolean) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#show(numRows:Int,truncate:Int,vertical:Boolean):Unit) | Displays the Dataset in a tabular form with truncation and vertical display options. |
| sort(Column*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#sort(sortExprs:org.apache.spark.sql.Column*):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset sorted by the given Column expressions. |
| sort(String, String*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#sort(sortCol:String,sortCols:String*):org.apache.spark.sql.Dataset[T]) | Returns a new Dataset sorted by the specified column names, all in ascending order. |
| summary(String*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#summary(statistics:String*):org.apache.spark.sql.DataFrame) | Computes specified statistics for numeric and string columns. Available statistics include count, mean, stddev, min, max, arbitrary percentiles, count_distinct, and approx_count_distinct. |
| tail(Int) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#tail(n:Int):Array[T]) | Returns the last n rows in the Dataset as an Array[T]. |
| take(Int) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#take(n:Int):Array[T]) | Returns the first n rows in the Dataset as an Array[T]. |
| takeAsList(Int) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#takeAsList(n:Int):java.util.List[T]) | Returns the first n rows in the Dataset as a java.util.List[T]. |
| to(StructType) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#to(schema:org.apache.spark.sql.types.StructType):org.apache.spark.sql.DataFrame) | Returns a new DataFrame where each row is reconciled to match the specified schema. |
| toDF() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#toDF():org.apache.spark.sql.DataFrame) | Converts this strongly typed collection of data to a generic DataFrame. |
| toDF(String*) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#toDF(colNames:String*):org.apache.spark.sql.DataFrame) | Converts this strongly typed collection of data to a generic DataFrame with columns renamed. |
| transform[U](Dataset[T] => Dataset[U]) | Concise syntax for chaining custom transformations. |
| union(Dataset[T]) | Returns a new Dataset containing the union of rows in this Dataset and another Dataset. Equivalent to UNION ALL in SQL. Resolves columns by position. |
| unionAll(Dataset[T]) | Returns a new Dataset containing the union of rows in this Dataset and another Dataset. This is an alias for union. |
| unionByName(Dataset[T]) | Returns a new Dataset containing the union of rows in this Dataset and another Dataset. Resolves columns by name (not by position). |
| unionByName(Dataset[T], Boolean) | Returns a new Dataset containing the union of rows, with support for missing columns. Missing columns are filled with null. |
| unpersist() (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#unpersist():Dataset.this.type) | Marks the Dataset as non-persistent and removes all blocks for it from memory and disk. |
| unpersist(Boolean) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#unpersist(blocking:Boolean):Dataset.this.type) | Marks the Dataset as non-persistent, optionally blocking until all blocks are deleted. |
| unpivot(Array[Column], Array[Column], String, String) | Unpivots a DataFrame from wide format to long format, optionally leaving identifier columns set. |
| unpivot(Array[Column], String, String) | Unpivots a DataFrame from wide format to long format, where values are set to all non-id columns. |
| where(Column) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#where(condition:org.apache.spark.sql.Column):org.apache.spark.sql.Dataset[T]) | Filters rows using the given Column condition. This is an alias for filter. |
| where(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#where(conditionExpr:String):org.apache.spark.sql.Dataset[T]) | Filters rows using the given SQL expression string. |
| withColumn(String, Column) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame) | Returns a new Dataset by adding a column or replacing the existing column that has the same name. |
| withColumnRenamed(String, String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#withColumnRenamed(existingName:String,newName:String):org.apache.spark.sql.DataFrame) | Returns a new Dataset with a column renamed. This is a no-op if the schema doesn’t contain the existing name. |
| withColumns(Map[String, Column]) | (Scala-specific) Returns a new Dataset by adding columns or replacing existing columns that have the same names. |
| withColumns(java.util.Map[String, Column]) | (Java-specific) Returns a new Dataset by adding columns or replacing existing columns that have the same names. |
| withColumnsRenamed(Map[String, String]) | (Scala-specific) Returns a new Dataset with columns renamed. This is a no-op if the schema doesn’t contain the existing name. |
| withColumnsRenamed(java.util.Map[String, String]) | (Java-specific) Returns a new Dataset with columns renamed. This is a no-op if the schema doesn’t contain the existing name. |
| withMetadata(String, Metadata) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#withMetadata(columnName:String,metadata:org.apache.spark.sql.types.Metadata):org.apache.spark.sql.DataFrame) | Returns a new Dataset by updating an existing column with metadata. |
| writeTo(String) (https://spark.apache.org/docs/3.5.6/api/scala/org/apache/spark/sql/Dataset.html#writeTo(table:String):org.apache.spark.sql.DataFrameWriterV2[T]) | Creates a write configuration builder for v2 sources. |