snowflake.ml.modeling.pipeline.Pipeline¶
- class snowflake.ml.modeling.pipeline.Pipeline(steps: List[Tuple[str, Any]])¶
Bases:
BaseTransformer
Pipeline of transforms.
Sequentially apply a list of transforms. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final step can be a transform or estimator, that is, it must implement fit and transform/predict methods. TODO: SKLearn pipeline expects last step(and only the last step) to be an estimator obj or a dummy
estimator(like None or passthrough). Currently this Pipeline class works with a list of all transforms or a list of transforms ending with an estimator. Should we change this implementation to only work with list of steps ending with an estimator or a dummy estimator like SKLearn?
- Parameters:
steps – List of (name, transform) tuples (implementing fit/transform) that are chained in sequential order. The last transform can be an estimator.
Methods
- fit(dataset: Union[DataFrame, DataFrame], squash: Optional[bool] = False) Pipeline ¶
Fit the entire pipeline using the dataset.
- Parameters:
dataset – Input dataset.
squash – Run the whole pipeline within a stored procedure
- Returns:
Fitted pipeline.
- Raises:
ValueError – A pipeline incompatible with sklearn is used on MLRS
- fit_predict(dataset: Union[DataFrame, DataFrame]) Union[DataFrame, DataFrame] ¶
Fits all the transformer objs one after another and transforms the data. Then fits and predicts using the estimator. This will only be available if the estimator (or final step) has fit_predict or predict methods.
- Parameters:
dataset – Input dataset.
- Returns:
Output dataset.
- fit_transform(dataset: Union[DataFrame, DataFrame]) Union[DataFrame, DataFrame] ¶
Fits all the transformer objs one after another and transforms the data. Then fits and transforms data using the estimator. This will only be available if the estimator (or final step) has fit_transform or transform methods.
- Parameters:
dataset – Input dataset.
- Returns:
Output dataset.
- get_input_cols() List[str] ¶
Input columns getter.
- Returns:
Input columns.
- get_label_cols() List[str] ¶
Label column getter.
- Returns:
Label column(s).
- get_output_cols() List[str] ¶
Output columns getter.
- Returns:
Output columns.
- get_params(deep: bool = True) Dict[str, Any] ¶
Get parameters for this transformer.
- Parameters:
deep – If True, will return the parameters for this transformer and contained subobjects that are transformers.
- Returns:
Parameter names mapped to their values.
- get_passthrough_cols() List[str] ¶
Passthrough columns getter.
- Returns:
Passthrough column(s).
- get_sample_weight_col() Optional[str] ¶
Sample weight column getter.
- Returns:
Sample weight column.
- get_sklearn_args(default_sklearn_obj: Optional[object] = None, sklearn_initial_keywords: Optional[Union[str, Iterable[str]]] = None, sklearn_unused_keywords: Optional[Union[str, Iterable[str]]] = None, snowml_only_keywords: Optional[Union[str, Iterable[str]]] = None, sklearn_added_keyword_to_version_dict: Optional[Dict[str, str]] = None, sklearn_added_kwarg_value_to_version_dict: Optional[Dict[str, Dict[str, str]]] = None, sklearn_deprecated_keyword_to_version_dict: Optional[Dict[str, str]] = None, sklearn_removed_keyword_to_version_dict: Optional[Dict[str, str]] = None) Dict[str, Any] ¶
Get sklearn keyword arguments.
This method enables modifying object parameters for special cases.
- Parameters:
default_sklearn_obj – Sklearn object used to get default parameter values. Necessary when sklearn_added_keyword_to_version_dict is provided.
sklearn_initial_keywords – Initial keywords in sklearn.
sklearn_unused_keywords – Sklearn keywords that are unused in snowml.
snowml_only_keywords – snowml only keywords not present in sklearn.
sklearn_added_keyword_to_version_dict – Added keywords mapped to the sklearn versions in which they were added.
sklearn_added_kwarg_value_to_version_dict – Added keyword argument values mapped to the sklearn versions in which they were added.
sklearn_deprecated_keyword_to_version_dict – Deprecated keywords mapped to the sklearn versions in which they were deprecated.
sklearn_removed_keyword_to_version_dict – Removed keywords mapped to the sklearn versions in which they were removed.
- Returns:
Sklearn parameter names mapped to their values.
- predict(dataset: Union[DataFrame, DataFrame]) Union[DataFrame, DataFrame] ¶
Transform the dataset by applying all the transformers in order and predict using the estimator.
- Parameters:
dataset – Input dataset.
- Returns:
Output dataset.
- Raises:
ValueError – An sklearn object has not been fit and stored before calling this function.
- predict_log_proba(dataset: Union[DataFrame, DataFrame]) Union[DataFrame, DataFrame] ¶
Transform the dataset by applying all the transformers in order and apply predict_log_proba using the estimator.
- Parameters:
dataset – Input dataset.
- Returns:
Output dataset.
- Raises:
ValueError – An sklearn object has not been fit before calling this function
- predict_proba(dataset: Union[DataFrame, DataFrame]) Union[DataFrame, DataFrame] ¶
Transform the dataset by applying all the transformers in order and apply predict_proba using the estimator.
- Parameters:
dataset – Input dataset.
- Returns:
Output dataset.
- Raises:
ValueError – An sklearn object has not been fit before calling this function
- score(dataset: Union[DataFrame, DataFrame]) Union[DataFrame, DataFrame] ¶
Transform the dataset by applying all the transformers in order and apply score using the estimator.
- Parameters:
dataset – Input dataset.
- Returns:
Output dataset.
- Raises:
ValueError – An sklearn object has not been fit before calling this function
- score_samples(dataset: Union[DataFrame, DataFrame]) Union[DataFrame, DataFrame] ¶
Transform the dataset by applying all the transformers in order and predict using the estimator.
- Parameters:
dataset – Input dataset.
- Returns:
Output dataset.
- Raises:
ValueError – An sklearn object has not been fit before calling this function
- set_drop_input_cols(drop_input_cols: Optional[bool] = False) None ¶
- set_input_cols(input_cols: Optional[Union[str, Iterable[str]]]) Base ¶
Input columns setter.
- Parameters:
input_cols – A single input column or multiple input columns.
- Returns:
self
- set_label_cols(label_cols: Optional[Union[str, Iterable[str]]]) Base ¶
Label column setter.
- Parameters:
label_cols – A single label column or multiple label columns if multi task learning.
- Returns:
self
- set_output_cols(output_cols: Optional[Union[str, Iterable[str]]]) Base ¶
Output columns setter.
- Parameters:
output_cols – A single output column or multiple output columns.
- Returns:
self
- set_params(**params: Dict[str, Any]) None ¶
Set the parameters of this transformer.
The method works on simple transformers as well as on nested objects. The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params – Transformer parameter names mapped to their values.
- Raises:
SnowflakeMLException – Invalid parameter keys.
- set_passthrough_cols(passthrough_cols: Optional[Union[str, Iterable[str]]]) Base ¶
Passthrough columns setter.
- Parameters:
passthrough_cols – Column(s) that should not be used or modified by the estimator/transformer. Estimator/Transformer just passthrough these columns without any modifications.
- Returns:
self
- set_sample_weight_col(sample_weight_col: Optional[str]) Base ¶
Sample weight column setter.
- Parameters:
sample_weight_col – A single column that represents sample weight.
- Returns:
self
- to_lightgbm() Any ¶
- to_sklearn() Pipeline ¶
Returns an sklearn Pipeline representing the object, if possible.
- Returns:
previously fit sklearn Pipeline if present, else an unfit pipeline
- Raises:
ValueError – The pipeline cannot be represented as an sklearn pipeline.
- to_xgboost() Any ¶
- transform(dataset: Union[DataFrame, DataFrame]) Union[DataFrame, DataFrame] ¶
Call transform of each transformer in the pipeline.
- Parameters:
dataset – Input dataset.
- Returns:
Transformed data. Output datatype will be same as input datatype.
Attributes
- model_signatures¶