Push feature#
This module defines various push classes used for model debugging.
The main classes are:
Push: Base push class
ExamplePush: Push based on example data
OverconfidencePush: Push for overconfidence cases
BorderlinePush: Push for borderline/underconfidence cases
ContributionPush: Push for high contribution features
PerturbationPush: Push for perturbation analysis
ExamplePush and its subclasses allow saving examples and generating one-sample tests. ContributionPush and PerturbationPush allow generating statistical tests and slicing functions.
The push classes allow converting to gRPC protobuf format via the to_grpc() method.
- class giskard.push.Push#
Base push class.
- push_title#
Title of the push
- details#
List of details/actions for the push
- tests#
List of tests to generate
- pushkind#
Enum of push kind
- class giskard.push.ExamplePush#
Push class based on example data.
- Adds attributes:
saved_example: Example dataset row training_label: Ground truth label training_label_proba: Probability of ground truth label
Can convert to gRPC protobuf format.
- class giskard.push.FeaturePush#
Push related to a specific feature.
- Adds attributes:
feature: Feature name value: Feature value
Can convert to gRPC protobuf format.
- class giskard.push.OverconfidencePush(training_label, training_label_proba, dataset_row, predicted_label, rate)#
Recommand actions for overconfidence cases.
- Description:
Tag examples that are incorrect but that were classified with a high probability as the wrong label. This may indicate that the model is overconfident on this example. This may be due to spurious correlation or a data leak
- Triggering event:
When we switch examples, the example is incorrect and the example is classified as overconfident. We quantify this as the difference between the largest probability assigned to a label and the probability assigned to the correct label (this will be 0 if the model made the correct prediction). If this is larger than a threshold (typically determined automatically depending on the number of classes), then the prediction is considered overconfident.
- Call to action description:
Create one-sample test : This adds a test that checks if the model is overconfident on this example to a test suite. Get similar examples : This filters the debugging session to show examples with overconfidence only.
- Requirements:
Ground truth label Classification models only
- class giskard.push.BorderlinePush(training_label, training_label_proba, dataset_row, rate)#
Recommend actions for borderline/underconfidence cases.
- Description:
Tag examples that are classified with very low confidence, indicating the model is unsure about the prediction. This may be due to inconsistent patterns or insufficient data.
- Triggering event:
When we switch examples, the example is classified as underconfident. By default, we mark a prediction as underconfident when the second most probable prediction has a probability which is only less than 10% smaller than the predicted label
- Call to action description:
Create one-sample test: This adds a test that checks if the model is underconfident on this example to a test suite. Get similar examples: This filters the debugging session to show examples with underconfidence only.
- Requirements:
Classification models only
- class giskard.push.ContributionPush(value=None, feature=None, feature_type=None, bounds=None, model_type=None, correct_prediction=None)#
Recommend actions for feature that have high SHAP values.
- Description:
Tag features that have a high SHAP value for this example. This may indicate that the model is relying heavily on this feature to make the prediction.
- Triggering event:
When we switch examples and the most contributing shapleyβs value is really high compared to the rest of the features. We mark a feature as high contributing by computing SHAP values for all features. Then we calculates z-scores to find any significant outliers. If the z-score is above a threshold (typically determined automatically depending on the number of features), then the feature is considered high contributing.
- Call to action description:
Save Slice : This will save the slice in the catalog and enable you to create tests more efficiently. Add Test to a test suite : This will add a test to a test suite to check if this slice performs better or worse than the rest of the dataset. Get similar examples : This will filter this debugging session to show examples from this slice only.
- Requirements:
Numerical and Categorical features only
- class giskard.push.PerturbationPush(value, feature, transformation_info: TransformationInfo)#
Recommend actions for feature that perturb the prediction.
- Description:
Tag features that when perturbed, change the prediction. This may indicate that the model is sensitive to this feature.
- Triggering event:
When we switch examples and the prediction changes when we perturb a feature. We mark a feature as sensitive by applying supported perturbations to each feature in the dataset. For numerical columns, we add/subtract values based on mean absolute deviation. For text columns, we apply predefined text transformations.
- Call to action description:
Add to test suite : This will add a test to a test suite to check if a perturbation on this feature changes the prediction above a threshold.
- Requirements:
Need Ground truth label Numerical and Text features only
Push tests catalog#
- giskard.push.push_test_catalog.catalog.test_diff_rmse_push(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Decreasing, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test for difference in RMSE between a slice and full dataset.
Checks if the RMSE on a sliced subset of the data is significantly different from the full dataset based on a threshold and direction.
Can be used with pushes to test if problematic slices have worse RMSE.
- Parameters:
model β Model to test
dataset β Full dataset
slicing_function β Function to slice dataset
threshold β Allowed RMSE difference
direction β Whether slice RMSE should increase or decrease
debug β Whether to return debug dataset
- Returns:
TestResult with pass/fail, slice sizes, RMSE diff, debug dataset
- giskard.push.push_test_catalog.catalog.test_diff_f1_push(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = -0.1, direction: SuiteInput | Direction | None = Direction.Increasing, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test for difference in F1 score between a slice and full dataset.
Checks if the F1 score on a sliced subset of the data is significantly different from the full dataset based on a threshold and direction.
Can be used with pushes to test if problematic slices have worse F1 score.
- Parameters:
model β Model to test
dataset β Full dataset
slicing_function β Function to slice dataset
threshold β Allowed F1 score difference
direction β Whether slice F1 should increase or decrease
debug β Whether to return debug dataset
- Returns:
TestResult with pass/fail, slice sizes, F1 diff, debug dataset
- giskard.push.push_test_catalog.catalog.if_underconfidence_rate_decrease(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, rate: SuiteInput | float | None = None) GiskardTestMethod [source]#
Test if underconfidence rate decreases for model on dataset.
Checks if the underconfidence rate for the model on the provided dataset is lower than the specified rate.
- Parameters:
model β Model to test
dataset β Dataset to test on
rate β Target underconfidence rate to decrease from
- Returns:
Test result with pass/fail and underconfidence rate metric
- Return type:
- giskard.push.push_test_catalog.catalog.if_overconfidence_rate_decrease(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, rate: SuiteInput | float | None = None) GiskardTestMethod [source]#
Test if overconfidence rate decreases for model on dataset.
Checks if the overconfidence rate for the model on the provided dataset is lower than the specified rate.
- Parameters:
model β Model to test
dataset β Dataset to test on
rate β Target overconfidence rate to decrease from
- Returns:
Test result with pass/fail and overconfidence rate metric
- Return type:
- giskard.push.push_test_catalog.catalog.correct_example(model: SuiteInput | BaseModel | None = None, saved_example: SuiteInput | Dataset | None = None, training_label: SuiteInput | Any | None = None) GiskardTestMethod [source]#
Test if model correctly predicts example.
Checks if the modelβs prediction on the saved example matches the provided training label ground truth.
- Parameters:
model β Model to test
saved_example β Example dataset
training_label β Ground truth label
- Returns:
Test result with pass/fail
- Return type:
- giskard.push.push_test_catalog.catalog.increase_probability(model: SuiteInput | BaseModel | None = None, saved_example: SuiteInput | Dataset | None = None, training_label: SuiteInput | Any | None = None, training_label_proba: SuiteInput | Any | None = None) GiskardTestMethod [source]#
Test if model probability for ground truth label increases.
Checks if the predicted probability for the training label increases compared to the original training probability.
- Parameters:
model β Model to test
saved_example β Example dataset
training_label β Ground truth label
training_label_proba β Original probability
- Returns:
Test result with pass/fail and probability metric
- Return type:
- giskard.push.push_test_catalog.catalog.one_sample_overconfidence_test(model: SuiteInput | BaseModel | None = None, saved_example: SuiteInput | Dataset | None = None) GiskardTestMethod [source]#
One-sample overconfidence test for example.
Checks if the overconfidence rate for the model on the provided example is below a threshold.
- Parameters:
model β Model to test
saved_example β Example dataset
- Returns:
Test result with pass/fail and overconfidence rate metric
- Return type:
- giskard.push.push_test_catalog.catalog.one_sample_underconfidence_test(model: SuiteInput | BaseModel | None = None, saved_example: SuiteInput | Dataset | None = None) GiskardTestMethod [source]#
One-sample underconfidence test for example.
Checks if the underconfidence rate for the model on the provided example is below a threshold.
- Parameters:
model β Model to test
saved_example β Example dataset
- Returns:
Test result with pass/fail and underconfidence rate metric
- Return type:
- giskard.push.push_test_catalog.catalog.test_metamorphic_invariance_with_mad(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, value_added: SuiteInput | float | None = None) GiskardTestMethod [source]#
Metamorphic test for numerical invariance with mean absolute deviation.
Checks if adding a value to a numerical feature keeps the prediction approximately the same using mean absolute deviation.
- Parameters:
model β Model to test
dataset β Dataset to test
column_name β Name of numerical feature column
value_added β Value to add to column
- Returns:
Test result with pass/fail
- Return type: