Models#

To scan, test and debug your model, you need to wrap it into a Giskard Model. Your model can use any ML library (sklearn, catboost, pytorch, tensorflow, huggingface and langchain) and can be any Python function that respects the right signature.

You can wrap your model in two different ways:

  1. Wrap a prediction function that contains all your data pre-processing steps. Prediction function is any Python function that takes input as raw pandas dataframe and returns the probabilities for each classification labels (classification) or predictions (regression or text_generation).

    Make sure that:

    • prediction_function encapsulates all the data pre-processing steps (categorical encoding, numerical scaling, etc.).

    • prediction_function(df[feature_names]) does not return an error message.

  2. Wrap a model object in addition to a data pre-processing function. Providing the model object to Model allows us to automatically infer the ML library of your model object and provide a suitable serialization method (provided by save_model and load_model methods).

    This requires:

    • Mandatory: Overriding the model_predict method which should take the input as raw pandas dataframe and return

    the probabilities for each classification labels (classification) or predictions (regression or text_generation). - Optional: Our pre-defined serialization and prediction methods cover the sklearn, catboost, pytorch, tensorflow, huggingface and langchain libraries. If none of these libraries are detected, cloudpickle is used as the default for serialization. If this fails, we will ask you to also override the save_model and load_model methods where you provide your own serialization of the model object.

Integrations#

The giskard.Model class#

class giskard.Model(model: Any, model_type: SupportedModelTypes | Literal['classification', 'regression', 'text_generation'], data_preprocessing_function: Callable[[DataFrame], Any] | None = None, model_postprocessing_function: Callable[[Any], Any] | None = None, name: str | None = None, description: str | None = None, feature_names: Iterable | None = None, classification_threshold: float | None = 0.5, classification_labels: Iterable | None = None, **kwargs)[source]#
Parameters:
  • model (Any) – Could be any function or ML model. The standard model output required for Giskard is: * if classification: an array (nxm) of probabilities corresponding to n data entries (rows of pandas.DataFrame) and m classification_labels. In the case of binary classification, an array of (nx1) probabilities is also accepted. Make sure that the probability provided is for the second label provided in classification_labels. * if regression or text_generation: an array of predictions corresponding to data entries (rows of pandas.DataFrame) and outputs.

  • name (Optional[str]) – Name of the model.

  • description (Optional[str]) – Description of the model’s task. Mandatory for non-langchain text_generation models.

  • model_type (ModelType) – The type of the model: regression, classification or text_generation.

  • data_preprocessing_function (Optional[Callable[[pd.DataFrame], Any]]) – A function that takes a pandas.DataFrame as raw input, applies preprocessing and returns any object that could be directly fed to clf. You can also choose to include your preprocessing inside clf, in which case no need to provide this argument.

  • model_postprocessing_function (Optional[Callable[[Any], Any]]) – A function that takes a clf output as input, applies postprocessing and returns an object of the same type and shape as the clf output.

  • feature_names (Optional[Iterable]) – list of feature names matching the column names in the data that correspond to the features which the model trained on. By default, feature_names are all the Dataset columns except from target.

  • classification_threshold (Optional[float]) – represents the classification model threshold, for binary classification models.

  • classification_labels (Optional[Iterable]) – that represents the classification labels, if model_type is classification. Make sure the labels have the same order as the column output of clf.

  • **kwargs – Additional keyword arguments.

  • model – The model that will be wrapped.

  • model_type – The type of the model. Must be a value from the ModelType enumeration.

  • data_preprocessing_function – A function that will be applied to incoming data. Default is None.

  • model_postprocessing_function – A function that will be applied to the model’s predictions. Default is None.

  • name – A name for the wrapper. Default is None.

  • feature_names – A list of feature names. Default is None.

  • classification_threshold – The probability threshold for classification. Default is 0.5.

  • classification_labels – A list of classification labels. Default is None.

  • batch_size (Optional[int]) – The batch size to use for inference. Default is None, which means inference will be done on the full dataframe.

static __new__(cls, model: Any, model_type: SupportedModelTypes | Literal['classification', 'regression', 'text_generation'], data_preprocessing_function: Callable[[DataFrame], Any] | None = None, model_postprocessing_function: Callable[[Any], Any] | None = None, name: str | None = None, description: str | None = None, feature_names: Iterable | None = None, classification_threshold: float | None = 0.5, classification_labels: Iterable | None = None, **kwargs)[source]#

Used for dynamical inheritance and returns one of the following class instances: PredictionFunctionModel, SKLearnModel, CatboostModel, HuggingFaceModel, PyTorchModel, TensorFlowModel or LangchainModel, depending on the ML library detected in the model object. If the model object provided does not belong to one of these libraries, an instance of CloudpickleSerializableModel is returned in which case:

  1. the default serialization method used will be cloudpickle

  2. you will be asked to provide your own model_predict method.

is_classification()#

Compute if the model is of type classification.

Returns:

True if the model is of type classification, False otherwise

Return type:

bool

is_binary_classification()#

Compute if the model is of type binary classification.

Returns:

True if the model is of type binary classification, False otherwise.

Return type:

bool

is_regression()#

Compute if the model is of type regression.

Returns:

True if the model is of type regression, False otherwise.

Return type:

bool

is_text_generation()#

Compute if the model is of type text generation.

Returns:

True if the model is of type text generation, False otherwise.

Return type:

bool

abstract model_predict(data)[source]#

Performs the model inference/forward pass.

Parameters:

data (Any) – The input data for making predictions. If you did not specify a data_preprocessing_function, this will be a pd.DataFrame, otherwise it will be whatever the data_preprocessing_function returns.

Returns:

If the model is classification, it should return an array of probabilities of shape (num_entries, num_classes). If the model is regression or text_generation, it should return an array of num_entries predictions.

Return type:

numpy.ndarray

predict(dataset: Dataset, *_args, **_kwargs) ModelPredictionResults[source]#

Generates predictions for the input giskard dataset. This method uses the prepare_dataframe() method to preprocess the input dataset before making predictions. The predict_df() method is used to generate raw predictions for the preprocessed data. The type of predictions generated by this method depends on the model type:

  • For regression models, the prediction field of the returned ModelPredictionResults object will contain the same

    values as the raw_prediction field.

  • For binary or multiclass classification models, the prediction field of the returned ModelPredictionResults object

    will contain the predicted class labels for each example in the input dataset. The probabilities field will contain the predicted probabilities for the predicted class label. The all_predictions field will contain the predicted probabilities for all class labels for each example in the input dataset.

Parameters:

dataset (Dataset) – The input dataset to make predictions on.

Raises:

ValueError – If the prediction task is not supported by the model.

Returns:

The prediction results for the input dataset.

Return type:

ModelPredictionResults

save_model(local_path: str | Path, *args, **kwargs) None[source]#

Saves the wrapped model object.

Parameters:

path (Union[str, Path]) – Path to which the model should be saved.

classmethod load_model(local_dir, model_py_ver: Tuple[str, str, str] | None = None, *args, **kwargs)[source]#

Loads the wrapped model object.

Parameters:
  • path (Union[str, Path]) – Path from which the model should be loaded.

  • model_py_ver (Optional[Tuple[str, str, str]]) – Python version used to save the model, to validate if model loading failed.

upload(client: GiskardClient, project_key, validate_ds=None, *_args, **_kwargs) str[source]#

Uploads the model to a Giskard project using the provided Giskard client. Also validates the model using the given validation dataset, if any.

Parameters:
  • client (GiskardClient) – A Giskard client instance to use for uploading the model.

  • project_key (str) – The project key to use for the upload.

  • validate_ds (Optional[Dataset]) – A validation dataset to use for validating the model. Defaults to None.

Notes

This method saves the model to a temporary directory before uploading it. The temporary directory is deleted after the upload is completed.

classmethod download(client: GiskardClient, project_key, model_id, *_args, **_kwargs)[source]#

Downloads the specified model from the Giskard hub and loads it into memory.

Parameters:
  • client (GiskardClient) – The client instance that will connect to the Giskard hub.

  • project_key (str) – The key for the project that the model belongs to.

  • model_id (str) – The ID of the model to download.

Returns:

An instance of the class calling the method, with the specified model loaded into memory.

Raises:

AssertionError – If the local directory where the model should be saved does not exist.

Model Prediction#

class giskard.models.base.ModelPredictionResults(*, raw: Any = None, prediction: Any = None, raw_prediction: Any = None, probabilities: Any | None = None, all_predictions: Any | None = None)[source]#

Data structure for model predictions.

For regression models, the prediction field of the returned ModelPredictionResults object will contain the same values as the raw_prediction field.

For binary or multiclass classification models, the prediction field of the returned ModelPredictionResults object will contain the predicted class labels for each example in the input dataset. The probabilities field will contain the predicted probabilities for the predicted class label. The all_predictions field will contain the predicted probabilities for all class labels for each example in the input dataset.

raw#

The predicted probabilities.

Type:

Optional[Any]

prediction#

The predicted class labels for each example in the input dataset.

Type:

Optional[Any]

raw_prediction#

The predicted class label.

Type:

Optional[Any]

probabilities#

The predicted probabilities for the predicted class label.

Type:

Optional[Any]

all_predictions#

The predicted probabilities for all class labels for each example in the input dataset.

Type:

Optional[Any]

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.