Base model classesΒΆ
- class giskard.models.base.BaseModel(model_type: SupportedModelTypes | Literal['classification', 'regression', 'text_generation'], name: str | None = None, description: str | None = None, feature_names: Iterable | None = None, classification_threshold: float | None = 0.5, classification_labels: Iterable | None = None, id: str | None = None, **kwargs)[source]ΒΆ
The BaseModel class is an abstract base class that defines the common interface for all the models used in this project.
- modelΒΆ
Could be any function or ML model. The standard model output required for Giskard is:
- if classification:
an array (nxm) of probabilities corresponding to n data entries (rows of pandas.DataFrame) and m classification_labels. In the case of binary classification, an array of (nx1) probabilities is also accepted. Make sure that the probability provided is for the second label provided in classification_labels.
- if regression or text_generation:
an array of predictions corresponding to data entries (rows of pandas.DataFrame) and outputs.
- Type:
Any
- nameΒΆ
the name of the model.
- Type:
Optional[str]
- model_typeΒΆ
The type of the model: regression, classification or text_generation.
- Type:
ModelType
- feature_namesΒΆ
list of feature names matching the column names in the data that correspond to the features which the model trained on. By default, feature_names are all the Dataset columns except from target.
- Type:
Optional[Iterable[str]]
- classification_thresholdΒΆ
represents the classification model threshold, for binary classification models.
- Type:
float
- classification_labelsΒΆ
that represents the classification labels, if model_type is classification. Make sure the labels have the same order as the column output of clf.
- Type:
Optional[Iterable[str]]
- Raises:
ValueError β If an invalid model type is specified. If duplicate values are found in the classification_labels.
Initialize a new instance of the BaseModel class.
- Parameters:
model_type (ModelType) β Type of the model, either ModelType.REGRESSION or ModelType.CLASSIFICATION.
name (Optional[str]) β Name of the model. If not provided, defaults to the class name.
description (Optional[str]) β Description of the modelβs task. Mandatory for non-langchain text_generation models.
feature_names (Optional[Iterable]) β A list of names of the input features.
classification_threshold (Optional[float]) β Threshold value used for classification models. Defaults to 0.5.
classification_labels (Optional[Iterable]) β A list of labels for classification models.
- Raises:
ValueError β If an invalid model_type value is provided.
ValueError β If duplicate values are found in the classification_labels list.
Notes
This class uses the @configured_validate_arguments decorator to validate the input arguments. The initialized object contains the following attributes:
meta: a ModelMeta object containing metadata about the model.
- property is_binary_classification: boolΒΆ
Compute if the model is of type binary classification.
- Returns:
True if the model is of type binary classification, False otherwise.
- Return type:
bool
- property is_classification: boolΒΆ
Compute if the model is of type classification.
- Returns:
True if the model is of type classification, False otherwise
- Return type:
bool
- property is_regression: boolΒΆ
Compute if the model is of type regression.
- Returns:
True if the model is of type regression, False otherwise.
- Return type:
bool
- property is_text_generation: boolΒΆ
Compute if the model is of type text generation.
- Returns:
True if the model is of type text generation, False otherwise.
- Return type:
bool
- predict(dataset: Dataset, *_args, **_kwargs) ModelPredictionResults [source]ΒΆ
Generates predictions for the input giskard dataset. This method uses the prepare_dataframe() method to preprocess the input dataset before making predictions. The predict_df() method is used to generate raw predictions for the preprocessed data. The type of predictions generated by this method depends on the model type:
- For regression models, the prediction field of the returned ModelPredictionResults object will contain the same
values as the raw_prediction field.
- For binary or multiclass classification models, the prediction field of the returned ModelPredictionResults object
will contain the predicted class labels for each example in the input dataset. The probabilities field will contain the predicted probabilities for the predicted class label. The all_predictions field will contain the predicted probabilities for all class labels for each example in the input dataset.
- Parameters:
dataset (Dataset) β The input dataset to make predictions on.
- Raises:
ValueError β If the prediction task is not supported by the model.
- Returns:
The prediction results for the input dataset.
- Return type:
- abstract predict_df(df: DataFrame, *args, **kwargs)[source]ΒΆ
Inner method that does the actual inference of a prepared dataframe :param df: dataframe to predict
- prepare_dataframe(df, column_dtypes=None, target=None, *_args, **_kwargs)[source]ΒΆ
Prepares a Pandas DataFrame for inference by ensuring the correct columns are present and have the correct data types.
- Parameters:
dataset (Dataset) β The dataset to prepare.
- Returns:
The prepared Pandas DataFrame.
- Return type:
pd.DataFrame
- Raises:
ValueError β If the target column is found in the dataset.
ValueError β If a specified feature name is not found in the dataset.
- talk(question: str, dataset: Dataset, scan_report: ScanReport = None, context: str = '')[source]ΒΆ
Perform the βtalkβ to the model.
Given question, allows to ask the model about prediction result, explanation, model performance, issues, etc.
- Parameters:
question (str) β User input query.
dataset (Dataset) β Giskard Dataset to be analysed by the βtalkβ.
context (str) β Context of the previous βtalkβ results. Necessary to keep context between sequential βtalkβ calls.
scan_report (ScanReport) β Giskard Scan Report to be analysed by the βtalkβ.
- Returns:
The response for the userβs prompt.
- Return type:
TalkResult
- class giskard.models.base.CloudpickleSerializableModel(model: Any, model_type: SupportedModelTypes | Literal['classification', 'regression', 'text_generation'], data_preprocessing_function: Callable[[DataFrame], Any] | None = None, model_postprocessing_function: Callable[[Any], Any] | None = None, name: str | None = None, feature_names: Iterable | None = None, classification_threshold: float | None = 0.5, classification_labels: Iterable | None = None, id: str | None = None, batch_size: int | None = None, **kwargs)[source]ΒΆ
A base class for models that are serializable by cloudpickle.
- Parameters:
model (Any) β The model that will be wrapped.
model_type (ModelType) β The type of the model. Must be a value from the
ModelType
enumeration.data_preprocessing_function (Optional[Callable[[pd.DataFrame], Any]]) β A function that will be applied to incoming data. Default is
None
.model_postprocessing_function (Optional[Callable[[Any], Any]]) β A function that will be applied to the modelβs predictions. Default is
None
.name (Optional[str]) β A name for the wrapper. Default is
None
.feature_names (Optional[Iterable]) β A list of feature names. Default is
None
.classification_threshold (Optional[float]) β The probability threshold for classification. Default is 0.5.
classification_labels (Optional[Iterable]) β A list of classification labels. Default is None.
batch_size (Optional[int]) β The batch size to use for inference. Default is
None
, which means inference will be done on the full dataframe.
- classmethod load_model(local_dir, model_py_ver: Tuple[str, str, str] | None = None, *args, **kwargs)[source]ΒΆ
Loads the wrapped
model
object.- Parameters:
path (Union[str, Path]) β Path from which the model should be loaded.
model_py_ver (Optional[Tuple[str, str, str]]) β Python version used to save the model, to validate if model loading failed.
- class giskard.models.base.MLFlowSerializableModel(model: Any, model_type: SupportedModelTypes | Literal['classification', 'regression', 'text_generation'], data_preprocessing_function: Callable[[DataFrame], Any] | None = None, model_postprocessing_function: Callable[[Any], Any] | None = None, name: str | None = None, feature_names: Iterable | None = None, classification_threshold: float | None = 0.5, classification_labels: Iterable | None = None, id: str | None = None, batch_size: int | None = None, **kwargs)[source]ΒΆ
A base class to serialize models with MLFlow.
This class provides functionality for saving the model with MLFlow in addition to saving other metadata with the save method. Subclasses should implement the save_model method to provide their own MLFlow-specific model saving functionality.
- Parameters:
model (Any) β The model that will be wrapped.
model_type (ModelType) β The type of the model. Must be a value from the
ModelType
enumeration.data_preprocessing_function (Optional[Callable[[pd.DataFrame], Any]]) β A function that will be applied to incoming data. Default is
None
.model_postprocessing_function (Optional[Callable[[Any], Any]]) β A function that will be applied to the modelβs predictions. Default is
None
.name (Optional[str]) β A name for the wrapper. Default is
None
.feature_names (Optional[Iterable]) β A list of feature names. Default is
None
.classification_threshold (Optional[float]) β The probability threshold for classification. Default is 0.5.
classification_labels (Optional[Iterable]) β A list of classification labels. Default is None.
batch_size (Optional[int]) β The batch size to use for inference. Default is
None
, which means inference will be done on the full dataframe.
- class giskard.models.base.WrapperModel(model: Any, model_type: SupportedModelTypes | Literal['classification', 'regression', 'text_generation'], data_preprocessing_function: Callable[[DataFrame], Any] | None = None, model_postprocessing_function: Callable[[Any], Any] | None = None, name: str | None = None, feature_names: Iterable | None = None, classification_threshold: float | None = 0.5, classification_labels: Iterable | None = None, id: str | None = None, batch_size: int | None = None, **kwargs)[source]ΒΆ
Base class for model wrappers.
This is subclass of a
BaseModel
that wraps an existing model object and uses it to make inference.This class introduces a data_preprocessing_function which can be used to preprocess incoming data before it is passed to the underlying model.
- Parameters:
model (Any) β The model that will be wrapped.
model_type (ModelType) β The type of the model. Must be a value from the
ModelType
enumeration.data_preprocessing_function (Optional[Callable[[pd.DataFrame], Any]]) β A function that will be applied to incoming data. Default is
None
.model_postprocessing_function (Optional[Callable[[Any], Any]]) β A function that will be applied to the modelβs predictions. Default is
None
.name (Optional[str]) β A name for the wrapper. Default is
None
.feature_names (Optional[Iterable]) β A list of feature names. Default is
None
.classification_threshold (Optional[float]) β The probability threshold for classification. Default is 0.5.
classification_labels (Optional[Iterable]) β A list of classification labels. Default is None.
batch_size (Optional[int]) β The batch size to use for inference. Default is
None
, which means inference will be done on the full dataframe.
- abstract classmethod load_model(path: str | Path, model_py_ver: Tuple[str, str, str] | None = None, *args, **kwargs)[source]ΒΆ
Loads the wrapped
model
object.- Parameters:
path (Union[str, Path]) β Path from which the model should be loaded.
model_py_ver (Optional[Tuple[str, str, str]]) β Python version used to save the model, to validate if model loading failed.
- abstract model_predict(data)[source]ΒΆ
Performs the model inference/forward pass.
- Parameters:
data (Any) β The input data for making predictions. If you did not specify a data_preprocessing_function, this will be a
pd.DataFrame
, otherwise it will be whatever the data_preprocessing_function returns.- Returns:
If the model is
classification
, it should return an array of probabilities of shape(num_entries, num_classes)
. If the model isregression
ortext_generation
, it should return an array ofnum_entries
predictions.- Return type:
numpy.ndarray