Models#
To scan, test and debug your model, you need to wrap it into a Giskard Model.
Your model can use any ML library (sklearn
, catboost
, pytorch
, tensorflow
,
huggingface
and langchain
) and can be any Python function that respects the right signature.
You can wrap your model in two different ways:
Wrap a prediction function that contains all your data pre-processing steps. Prediction function is any Python function that takes input as raw pandas dataframe and returns the probabilities for each classification labels (classification) or predictions (regression or text_generation).
Make sure that:
prediction_function
encapsulates all the data pre-processing steps (categorical encoding, numerical scaling, etc.).prediction_function(df[feature_names])
does not return an error message.
Wrap a model object in addition to a data pre-processing function. Providing the model object to Model allows us to automatically infer the ML library of your model object and provide a suitable serialization method (provided by
save_model
andload_model
methods).This requires:
Mandatory: Overriding the model_predict method which should take the input as raw pandas dataframe and return
the probabilities for each classification labels (classification) or predictions (regression or text_generation). - Optional: Our pre-defined serialization and prediction methods cover the
sklearn
,catboost
,pytorch
,tensorflow
,huggingface
andlangchain
libraries. If none of these libraries are detected, cloudpickle is used as the default for serialization. If this fails, we will ask you to also override thesave_model
andload_model
methods where you provide your own serialization of the model object.
Integrations#
The giskard.Model
class#
- class giskard.Model(model: Any, model_type: SupportedModelTypes | Literal['classification', 'regression', 'text_generation'], data_preprocessing_function: Callable[[DataFrame], Any] | None = None, model_postprocessing_function: Callable[[Any], Any] | None = None, name: str | None = None, feature_names: Iterable | None = None, classification_threshold: float | None = 0.5, classification_labels: Iterable | None = None, **kwargs)#
- Parameters:
model (Any) β Could be any function or ML model. The standard model output required for Giskard is: * if classification: an array (nxm) of probabilities corresponding to n data entries (rows of pandas.DataFrame) and m classification_labels. In the case of binary classification, an array of (nx1) probabilities is also accepted. Make sure that the probability provided is for the second label provided in classification_labels. * if regression or text_generation: an array of predictions corresponding to data entries (rows of pandas.DataFrame) and outputs.
name (str, optional) β the name of the model.
model_type (ModelType) β The type of the model: regression, classification or text_generation.
data_preprocessing_function (Callable[[pd.DataFrame], Any], optional) β A function that takes a pandas.DataFrame as raw input, applies preprocessing and returns any object that could be directly fed to clf. You can also choose to include your preprocessing inside clf, in which case no need to provide this argument.
model_postprocessing_function (Callable[[Any], Any], optional) β A function that takes a clf output as input, applies postprocessing and returns an object of the same type and shape as the clf output.
feature_names (Optional[Iterable], optional) β list of feature names matching the column names in the data that correspond to the features which the model trained on. By default, feature_names are all the Dataset columns except from target.
classification_threshold (float, optional) β represents the classification model threshold, for binary classification models.
classification_labels (Optional[Iterable], optional) β that represents the classification labels, if model_type is classification. Make sure the labels have the same order as the column output of clf.
**kwargs β Additional keyword arguments.
model β The model that will be wrapped.
model_type β The type of the model. Must be a value from the
ModelType
enumeration.data_preprocessing_function β A function that will be applied to incoming data. Default is
None
.model_postprocessing_function β A function that will be applied to the modelβs predictions. Default is
None
.name β A name for the wrapper. Default is
None
.feature_names β A list of feature names. Default is
None
.classification_threshold β The probability threshold for classification. Default is 0.5.
classification_labels β A list of classification labels. Default is None.
batch_size (Optional[int], optional) β The batch size to use for inference. Default is
None
, which means inference will be done on the full dataframe.
- static __new__(cls, model: Any, model_type: SupportedModelTypes | Literal['classification', 'regression', 'text_generation'], data_preprocessing_function: Callable[[DataFrame], Any] | None = None, model_postprocessing_function: Callable[[Any], Any] | None = None, name: str | None = None, feature_names: Iterable | None = None, classification_threshold: float | None = 0.5, classification_labels: Iterable | None = None, **kwargs)#
Used for dynamical inheritance and returns one of the following class instances:
PredictionFunctionModel
,SKLearnModel
,CatboostModel
,HuggingFaceModel
,PyTorchModel
,TensorFlowModel
orLangchainModel
, depending on the ML library detected in themodel
object. If themodel
object provided does not belong to one of these libraries, an instance ofCloudpickleSerializableModel
is returned in which case:the default serialization method used will be
cloudpickle
you will be asked to provide your own
model_predict
method.
- is_classification()#
Returns True if the model is of type classification, False otherwise.
- is_binary_classification()#
Returns True if the model is of type binary classification, False otherwise.
- is_regression()#
Returns True if the model is of type regression, False otherwise.
- is_text_generation()#
Returns True if the model is of type text generation, False otherwise.
- abstract model_predict(data)#
Performs the model inference/forward pass.
- Parameters:
data (Any) β The input data for making predictions. If you did not specify a data_preprocessing_function, this will be a
pd.DataFrame
, otherwise it will be whatever the data_preprocessing_function returns.- Returns:
If the model is
classification
, it should return an array of probabilities of shape(num_entries, num_classes)
. If the model isregression
ortext_generation
, it should return an array ofnum_entries
predictions.- Return type:
numpy.ndarray
- predict(dataset: Dataset) ModelPredictionResults #
Generates predictions for the input giskard dataset. This method uses the prepare_dataframe() method to preprocess the input dataset before making predictions. The predict_df() method is used to generate raw predictions for the preprocessed data. The type of predictions generated by this method depends on the model type: * For regression models, the prediction field of the returned ModelPredictionResults object will contain the same
values as the raw_prediction field.
For binary or multiclass classification models, the prediction field of the returned ModelPredictionResults object will contain the predicted class labels for each example in the input dataset. The probabilities field will contain the predicted probabilities for the predicted class label. The all_predictions field will contain the predicted probabilities for all class labels for each example in the input dataset.
- Parameters:
dataset (Dataset) β The input dataset to make predictions on.
- Returns:
The prediction results for the input dataset.
- Return type:
- Raises:
ValueError β If the prediction task is not supported by the model.
- save_model(local_path: str | Path) None #
Saves the wrapped
model
object.- Parameters:
path (Union[str, Path]) β Path to which the model should be saved.
- classmethod load_model(local_dir)#
Loads the wrapped
model
object.- Parameters:
path (Union[str, Path]) β Path from which the model should be loaded.
- upload(client: GiskardClient, project_key, validate_ds=None) str #
Uploads the model to a Giskard project using the provided Giskard client. Also validates the model using the given validation dataset, if any.
- Parameters:
client (GiskardClient) β A Giskard client instance to use for uploading the model.
project_key (str) β The project key to use for the upload.
validate_ds (Dataset, optional) β A validation dataset to use for validating the model. Defaults to None.
Notes
This method saves the model to a temporary directory before uploading it. The temporary directory is deleted after the upload is completed.
- classmethod download(client: GiskardClient, project_key, model_id)#
Downloads the specified model from the Giskard server and loads it into memory.
- Parameters:
client (GiskardClient) β The client instance that will connect to the Giskard server.
project_key (str) β The key for the project that the model belongs to.
model_id (str) β The ID of the model to download.
- Returns:
An instance of the class calling the method, with the specified model loaded into memory.
- Raises:
AssertionError β If the local directory where the model should be saved does not exist.
Model Prediction#
- class giskard.models.base.ModelPredictionResults(*, raw: Any = None, prediction: Any = None, raw_prediction: Any = None, probabilities: Any | None = None, all_predictions: Any | None = None)#
Data structure for model predictions.
For regression models, the prediction field of the returned ModelPredictionResults object will contain the same values as the raw_prediction field.
For binary or multiclass classification models, the prediction field of the returned ModelPredictionResults object will contain the predicted class labels for each example in the input dataset. The probabilities field will contain the predicted probabilities for the predicted class label. The all_predictions field will contain the predicted probabilities for all class labels for each example in the input dataset.
- raw#
The predicted probabilities.
- Type:
Any, optional
- prediction#
The predicted class labels for each example in the input dataset.
- Type:
Any, optional
- raw_prediction#
The predicted class label.
- Type:
Any, optional
- probabilities#
The predicted probabilities for the predicted class label.
- Type:
Any, optional
- all_predictions#
The predicted probabilities for all class labels for each example in the input dataset.
- Type:
Any, optional
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.