Automated model insights¶

This module defines various push classes used for model debugging.

The main classes are:

Push: Base push class
ExamplePush: Push based on example data
OverconfidencePush: Push for overconfidence cases
UnderconfidencePush: Push for underconfidence cases
ContributionPush: Push for high contribution features
PerturbationPush: Push for perturbation analysis

ExamplePush and its subclasses allow saving examples and generating one-sample tests. ContributionPush and PerturbationPush allow generating statistical tests and slicing functions.

The push classes allow converting to gRPC protobuf format via the to_grpc() method.

class giskard.push.Push[source]¶

Base push class.

push_title¶: Title of the push

details¶: List of details/actions for the push

tests¶: List of tests to generate

pushkind¶: Enum of push kind

class giskard.push.ExamplePush[source]¶

Push class based on example data.

Adds attributes:: saved_example: Example dataset row training_label: Ground truth label training_label_proba: Probability of ground truth label

Can convert to gRPC protobuf format.

class giskard.push.FeaturePush[source]¶

Push related to a specific feature.

Adds attributes:: feature: Feature name value: Feature value

Can convert to gRPC protobuf format.

class giskard.push.OverconfidencePush(training_label, training_label_proba, dataset_row, predicted_label, rate)[source]¶

Recommand actions for overconfidence cases.

Description:: Tag examples that are incorrect but that were classified with a high probability as the wrong label. This may indicate that the model is overconfident on this example. This may be due to spurious correlation or a data leak
Triggering event:: When we switch examples, the example is incorrect and the example is classified as overconfident. We quantify this as the difference between the largest probability assigned to a label and the probability assigned to the correct label (this will be 0 if the model made the correct prediction). If this is larger than a threshold (typically determined automatically depending on the number of classes), then the prediction is considered overconfident.
Call to action description:: Create one-sample test : This adds a test that checks if the model is overconfident on this example to a test suite. Get similar examples : This filters the debugging session to show examples with overconfidence only.
Requirements:: Ground truth label Classification models only

class giskard.push.ContributionPush(value=None, feature=None, feature_type=None, bounds=None, model_type=None, correct_prediction=None)[source]¶

Recommend actions for feature that have high SHAP values.

Description:: Tag features that have a high SHAP value for this example. This may indicate that the model is relying heavily on this feature to make the prediction.
Triggering event:: When we switch examples and the most contributing shapley’s value is really high compared to the rest of the features. We mark a feature as high contributing by computing SHAP values for all features. Then we calculates z-scores to find any significant outliers. If the z-score is above a threshold (typically determined automatically depending on the number of features), then the feature is considered high contributing.
Call to action description:: Save Slice : This will save the slice in the catalog and enable you to create tests more efficiently. Add Test to a test suite : This will add a test to a test suite to check if this slice performs better or worse than the rest of the dataset. Get similar examples : This will filter this debugging session to show examples from this slice only.
Requirements:: Numerical and Categorical features only

class giskard.push.PerturbationPush(value, feature, transformation_info: TransformationInfo)[source]¶

Recommend actions for feature that perturb the prediction.

Description:: Tag features that when perturbed, change the prediction. This may indicate that the model is sensitive to this feature.
Triggering event:: When we switch examples and the prediction changes when we perturb a feature. We mark a feature as sensitive by applying supported perturbations to each feature in the dataset. For numerical columns, we add/subtract values based on mean absolute deviation. For text columns, we apply predefined text transformations.
Call to action description:: Add to test suite : This will add a test to a test suite to check if a perturbation on this feature changes the prediction above a threshold.
Requirements:: Need Ground truth label Numerical and Text features only

Automated model insights catalog¶

Test for difference in RMSE between a slice and full dataset.

Checks if the RMSE on a sliced subset of the data is significantly different from the full dataset based on a threshold and direction.

Can be used with pushes to test if problematic slices have worse RMSE.

Parameters:

model (BaseModel) – Model to test
dataset (Dataset) – Full dataset
slicing_function (Optional[SlicingFunction], optional) – Function to slice dataset, by default None
threshold (float, optional) – Allowed RMSE difference, by default 0.1
direction (Direction, optional) – Whether slice RMSE should increase or decrease, by default Direction.Decreasing

Returns:

test result with pass/fail, slice sizes, RMSE diff, debug dataset

Return type:

TestResult

Test for difference in F1 score between a slice and full dataset.

Checks if the F1 score on a sliced subset of the data is significantly different from the full dataset based on a threshold and direction.

Can be used with pushes to test if problematic slices have worse F1 score.

Parameters:

model (BaseModel) – Model to test
dataset (Dataset) – Full dataset
slicing_function (Optional[SlicingFunction], optional) – Function to slice dataset, by default None
threshold (float, optional) – Allowed F1 score difference, by default -0.1
direction (Direction, optional) – Whether slice F1 should increase or decrease, by default Direction.Increasing

Returns:

test result with pass/fail, slice sizes, F1 diff, debug dataset

Return type:

TestResult

Test if underconfidence rate decreases for model on dataset.

Checks if the underconfidence rate for the model on the provided dataset is lower than the specified rate.

Parameters:

model (BaseModel) – Model to test
dataset (Dataset) – Dataset to test on
rate (float) – Target underconfidence rate to decrease from

Returns:

Test result with pass/fail and underconfidence rate metric

Return type:

TestResult

Test if overconfidence rate decreases for model on dataset.

Checks if the overconfidence rate for the model on the provided dataset is lower than the specified rate.

Parameters:

model (BaseModel) – Model to test
dataset (Dataset) – Dataset to test on
rate (float) – Target overconfidence rate to decrease from

Returns:

Test result with pass/fail and overconfidence rate metric

Return type:

TestResult

Test if model correctly predicts example.

Checks if the model’s prediction on the saved example matches the provided training label ground truth.

Parameters:

model (BaseModel) – Model to test
saved_example (Dataset) – Example dataset
training_label (Any) – Ground truth label

Returns:

Test result with pass/fail

Return type:

TestResult

Test if model probability for ground truth label increases.

Checks if the predicted probability for the training label increases compared to the original training probability.

Parameters:

model (BaseModel) – Model to test
saved_example (Dataset) – Example dataset
training_label (Any) – Ground truth label
training_label_proba (Any) – Original probability

Returns:

Test result with pass/fail and probability metric

Return type:

TestResult

giskard.push.push_test_catalog.catalog.one_sample_overconfidence_test(model: SuiteInput | BaseModel | None = None, saved_example: SuiteInput | Dataset | None = None) → GiskardTestMethod[source]¶

One-sample overconfidence test for example.

Checks if the overconfidence rate for the model on the provided example is below a threshold.

Parameters:

model (BaseModel) – Model to test
saved_example (Dataset) – Example dataset

Returns:

Test result with pass/fail and overconfidence rate metric

Return type:

TestResult

giskard.push.push_test_catalog.catalog.one_sample_underconfidence_test(model: SuiteInput | BaseModel | None = None, saved_example: SuiteInput | Dataset | None = None) → GiskardTestMethod[source]¶

One-sample underconfidence test for example.

Checks if the underconfidence rate for the model on the provided example is below a threshold.

Parameters:

model (BaseModel) – Model to test
saved_example (Dataset) – Example dataset

Returns:

Test result with pass/fail and underconfidence rate metric

Return type:

TestResult

Metamorphic test for numerical invariance with mean absolute deviation.

Checks if adding a value to a numerical feature keeps the prediction approximately the same using mean absolute deviation.

Parameters:

model (BaseModel) – Model to test
dataset (Dataset) – Dataset to test
column_name (str) – Name of numerical feature column
value_added (float) – Value to add to column

Returns:

Test result with pass/fail

Return type:

TestResult