Base model classes#

class giskard.models.base.BaseModel(model_type: SupportedModelTypes | Literal['classification', 'regression', 'text_generation'], name: str | None = None, description: str | None = None, feature_names: Iterable | None = None, classification_threshold: float | None = 0.5, classification_labels: Iterable | None = None, id: str | None = None, **kwargs)[source]#

The BaseModel class is an abstract base class that defines the common interface for all the models used in this project.

model#

Could be any function or ML model. The standard model output required for Giskard is:

  • if classification:

    an array (nxm) of probabilities corresponding to n data entries (rows of pandas.DataFrame) and m classification_labels. In the case of binary classification, an array of (nx1) probabilities is also accepted. Make sure that the probability provided is for the second label provided in classification_labels.

  • if regression or text_generation:

    an array of predictions corresponding to data entries (rows of pandas.DataFrame) and outputs.

Type:

Any

name#

the name of the model.

Type:

Optional[str]

model_type#

The type of the model: regression, classification or text_generation.

Type:

ModelType

feature_names#

list of feature names matching the column names in the data that correspond to the features which the model trained on. By default, feature_names are all the Dataset columns except from target.

Type:

Optional[Iterable[str]]

classification_threshold#

represents the classification model threshold, for binary classification models.

Type:

float

classification_labels#

that represents the classification labels, if model_type is classification. Make sure the labels have the same order as the column output of clf.

Type:

Optional[Iterable[str]]

Raises:

ValueError – If an invalid model type is specified. If duplicate values are found in the classification_labels.

Initialize a new instance of the BaseModel class.

Parameters:
  • model_type (ModelType) – Type of the model, either ModelType.REGRESSION or ModelType.CLASSIFICATION.

  • name (Optional[str]) – Name of the model. If not provided, defaults to the class name.

  • description (Optional[str]) – Description of the model’s task. Mandatory for non-langchain text_generation models.

  • feature_names (Optional[Iterable]) – A list of names of the input features.

  • classification_threshold (Optional[float]) – Threshold value used for classification models. Defaults to 0.5.

  • classification_labels (Optional[Iterable]) – A list of labels for classification models.

Raises:
  • ValueError – If an invalid model_type value is provided.

  • ValueError – If duplicate values are found in the classification_labels list.

Notes

This class uses the @configured_validate_arguments decorator to validate the input arguments. The initialized object contains the following attributes:

  • meta: a ModelMeta object containing metadata about the model.

classmethod download(client: GiskardClient | None, project_key, model_id, *_args, **_kwargs)[source]#

Downloads the specified model from the Giskard hub and loads it into memory.

Parameters:
  • client (GiskardClient) – The client instance that will connect to the Giskard hub.

  • project_key (str) – The key for the project that the model belongs to.

  • model_id (str) – The ID of the model to download.

Returns:

An instance of the class calling the method, with the specified model loaded into memory.

Raises:

AssertionError – If the local directory where the model should be saved does not exist.

property is_binary_classification: bool#

Compute if the model is of type binary classification.

Returns:

True if the model is of type binary classification, False otherwise.

Return type:

bool

property is_classification: bool#

Compute if the model is of type classification.

Returns:

True if the model is of type classification, False otherwise

Return type:

bool

property is_regression: bool#

Compute if the model is of type regression.

Returns:

True if the model is of type regression, False otherwise.

Return type:

bool

property is_text_generation: bool#

Compute if the model is of type text generation.

Returns:

True if the model is of type text generation, False otherwise.

Return type:

bool

predict(dataset: Dataset, *_args, **_kwargs) ModelPredictionResults[source]#

Generates predictions for the input giskard dataset. This method uses the prepare_dataframe() method to preprocess the input dataset before making predictions. The predict_df() method is used to generate raw predictions for the preprocessed data. The type of predictions generated by this method depends on the model type:

  • For regression models, the prediction field of the returned ModelPredictionResults object will contain the same

    values as the raw_prediction field.

  • For binary or multiclass classification models, the prediction field of the returned ModelPredictionResults object

    will contain the predicted class labels for each example in the input dataset. The probabilities field will contain the predicted probabilities for the predicted class label. The all_predictions field will contain the predicted probabilities for all class labels for each example in the input dataset.

Parameters:

dataset (Dataset) – The input dataset to make predictions on.

Raises:

ValueError – If the prediction task is not supported by the model.

Returns:

The prediction results for the input dataset.

Return type:

ModelPredictionResults

abstract predict_df(df: DataFrame, *args, **kwargs)[source]#

Inner method that does the actual inference of a prepared dataframe :param df: dataframe to predict

prepare_dataframe(df, column_dtypes=None, target=None, *_args, **_kwargs)[source]#

Prepares a Pandas DataFrame for inference by ensuring the correct columns are present and have the correct data types.

Parameters:

dataset (Dataset) – The dataset to prepare.

Returns:

The prepared Pandas DataFrame.

Return type:

pd.DataFrame

Raises:
  • ValueError – If the target column is found in the dataset.

  • ValueError – If a specified feature name is not found in the dataset.

upload(client: GiskardClient, project_key, validate_ds=None, *_args, **_kwargs) str[source]#

Uploads the model to a Giskard project using the provided Giskard client. Also validates the model using the given validation dataset, if any.

Parameters:
  • client (GiskardClient) – A Giskard client instance to use for uploading the model.

  • project_key (str) – The project key to use for the upload.

  • validate_ds (Optional[Dataset]) – A validation dataset to use for validating the model. Defaults to None.

Notes

This method saves the model to a temporary directory before uploading it. The temporary directory is deleted after the upload is completed.

class giskard.models.base.CloudpickleSerializableModel(model: Any, model_type: SupportedModelTypes | Literal['classification', 'regression', 'text_generation'], data_preprocessing_function: Callable[[DataFrame], Any] | None = None, model_postprocessing_function: Callable[[Any], Any] | None = None, name: str | None = None, feature_names: Iterable | None = None, classification_threshold: float | None = 0.5, classification_labels: Iterable | None = None, id: str | None = None, batch_size: int | None = None, **kwargs)[source]#

A base class for models that are serializable by cloudpickle.

Parameters:
  • model (Any) – The model that will be wrapped.

  • model_type (ModelType) – The type of the model. Must be a value from the ModelType enumeration.

  • data_preprocessing_function (Optional[Callable[[pd.DataFrame], Any]]) – A function that will be applied to incoming data. Default is None.

  • model_postprocessing_function (Optional[Callable[[Any], Any]]) – A function that will be applied to the model’s predictions. Default is None.

  • name (Optional[str]) – A name for the wrapper. Default is None.

  • feature_names (Optional[Iterable]) – A list of feature names. Default is None.

  • classification_threshold (Optional[float]) – The probability threshold for classification. Default is 0.5.

  • classification_labels (Optional[Iterable]) – A list of classification labels. Default is None.

  • batch_size (Optional[int]) – The batch size to use for inference. Default is None, which means inference will be done on the full dataframe.

classmethod load_model(local_dir, model_py_ver: Tuple[str, str, str] | None = None, *args, **kwargs)[source]#

Loads the wrapped model object.

Parameters:
  • path (Union[str, Path]) – Path from which the model should be loaded.

  • model_py_ver (Optional[Tuple[str, str, str]]) – Python version used to save the model, to validate if model loading failed.

save_model(local_path: str | Path, *args, **kwargs) None[source]#

Saves the wrapped model object.

Parameters:

path (Union[str, Path]) – Path to which the model should be saved.

class giskard.models.base.MLFlowSerializableModel(model: Any, model_type: SupportedModelTypes | Literal['classification', 'regression', 'text_generation'], data_preprocessing_function: Callable[[DataFrame], Any] | None = None, model_postprocessing_function: Callable[[Any], Any] | None = None, name: str | None = None, feature_names: Iterable | None = None, classification_threshold: float | None = 0.5, classification_labels: Iterable | None = None, id: str | None = None, batch_size: int | None = None, **kwargs)[source]#

A base class to serialize models with MLFlow.

This class provides functionality for saving the model with MLFlow in addition to saving other metadata with the save method. Subclasses should implement the save_model method to provide their own MLFlow-specific model saving functionality.

Parameters:
  • model (Any) – The model that will be wrapped.

  • model_type (ModelType) – The type of the model. Must be a value from the ModelType enumeration.

  • data_preprocessing_function (Optional[Callable[[pd.DataFrame], Any]]) – A function that will be applied to incoming data. Default is None.

  • model_postprocessing_function (Optional[Callable[[Any], Any]]) – A function that will be applied to the model’s predictions. Default is None.

  • name (Optional[str]) – A name for the wrapper. Default is None.

  • feature_names (Optional[Iterable]) – A list of feature names. Default is None.

  • classification_threshold (Optional[float]) – The probability threshold for classification. Default is 0.5.

  • classification_labels (Optional[Iterable]) – A list of classification labels. Default is None.

  • batch_size (Optional[int]) – The batch size to use for inference. Default is None, which means inference will be done on the full dataframe.

class giskard.models.base.WrapperModel(model: Any, model_type: SupportedModelTypes | Literal['classification', 'regression', 'text_generation'], data_preprocessing_function: Callable[[DataFrame], Any] | None = None, model_postprocessing_function: Callable[[Any], Any] | None = None, name: str | None = None, feature_names: Iterable | None = None, classification_threshold: float | None = 0.5, classification_labels: Iterable | None = None, id: str | None = None, batch_size: int | None = None, **kwargs)[source]#

Base class for model wrappers.

This is subclass of a BaseModel that wraps an existing model object and uses it to make inference.

This class introduces a data_preprocessing_function which can be used to preprocess incoming data before it is passed to the underlying model.

Parameters:
  • model (Any) – The model that will be wrapped.

  • model_type (ModelType) – The type of the model. Must be a value from the ModelType enumeration.

  • data_preprocessing_function (Optional[Callable[[pd.DataFrame], Any]]) – A function that will be applied to incoming data. Default is None.

  • model_postprocessing_function (Optional[Callable[[Any], Any]]) – A function that will be applied to the model’s predictions. Default is None.

  • name (Optional[str]) – A name for the wrapper. Default is None.

  • feature_names (Optional[Iterable]) – A list of feature names. Default is None.

  • classification_threshold (Optional[float]) – The probability threshold for classification. Default is 0.5.

  • classification_labels (Optional[Iterable]) – A list of classification labels. Default is None.

  • batch_size (Optional[int]) – The batch size to use for inference. Default is None, which means inference will be done on the full dataframe.

abstract classmethod load_model(path: str | Path, model_py_ver: Tuple[str, str, str] | None = None, *args, **kwargs)[source]#

Loads the wrapped model object.

Parameters:
  • path (Union[str, Path]) – Path from which the model should be loaded.

  • model_py_ver (Optional[Tuple[str, str, str]]) – Python version used to save the model, to validate if model loading failed.

abstract model_predict(data)[source]#

Performs the model inference/forward pass.

Parameters:

data (Any) – The input data for making predictions. If you did not specify a data_preprocessing_function, this will be a pd.DataFrame, otherwise it will be whatever the data_preprocessing_function returns.

Returns:

If the model is classification, it should return an array of probabilities of shape (num_entries, num_classes). If the model is regression or text_generation, it should return an array of num_entries predictions.

Return type:

numpy.ndarray

predict_df(df: DataFrame, *_args, **_kwargs)[source]#

Inner method that does the actual inference of a prepared dataframe :param df: dataframe to predict

abstract save_model(path: str | Path, *args, **kwargs) None[source]#

Saves the wrapped model object.

Parameters:

path (Union[str, Path]) – Path to which the model should be saved.