Automated model insights#

This module defines various push classes used for model debugging.

The main classes are:

  • Push: Base push class

  • ExamplePush: Push based on example data

  • OverconfidencePush: Push for overconfidence cases

  • UnderconfidencePush: Push for underconfidence cases

  • ContributionPush: Push for high contribution features

  • PerturbationPush: Push for perturbation analysis

ExamplePush and its subclasses allow saving examples and generating one-sample tests. ContributionPush and PerturbationPush allow generating statistical tests and slicing functions.

The push classes allow converting to gRPC protobuf format via the to_grpc() method.

class giskard.push.Push[source]#

Base push class.

push_title#

Title of the push

details#

List of details/actions for the push

tests#

List of tests to generate

pushkind#

Enum of push kind

class giskard.push.ExamplePush[source]#

Push class based on example data.

Adds attributes:

saved_example: Example dataset row training_label: Ground truth label training_label_proba: Probability of ground truth label

Can convert to gRPC protobuf format.

class giskard.push.FeaturePush[source]#

Push related to a specific feature.

Adds attributes:

feature: Feature name value: Feature value

Can convert to gRPC protobuf format.

class giskard.push.OverconfidencePush(training_label, training_label_proba, dataset_row, predicted_label, rate)[source]#

Recommand actions for overconfidence cases.

Description:

Tag examples that are incorrect but that were classified with a high probability as the wrong label. This may indicate that the model is overconfident on this example. This may be due to spurious correlation or a data leak

Triggering event:

When we switch examples, the example is incorrect and the example is classified as overconfident. We quantify this as the difference between the largest probability assigned to a label and the probability assigned to the correct label (this will be 0 if the model made the correct prediction). If this is larger than a threshold (typically determined automatically depending on the number of classes), then the prediction is considered overconfident.

Call to action description:

Create one-sample test : This adds a test that checks if the model is overconfident on this example to a test suite. Get similar examples : This filters the debugging session to show examples with overconfidence only.

Requirements:

Ground truth label Classification models only

class giskard.push.ContributionPush(value=None, feature=None, feature_type=None, bounds=None, model_type=None, correct_prediction=None)[source]#

Recommend actions for feature that have high SHAP values.

Description:

Tag features that have a high SHAP value for this example. This may indicate that the model is relying heavily on this feature to make the prediction.

Triggering event:

When we switch examples and the most contributing shapley’s value is really high compared to the rest of the features. We mark a feature as high contributing by computing SHAP values for all features. Then we calculates z-scores to find any significant outliers. If the z-score is above a threshold (typically determined automatically depending on the number of features), then the feature is considered high contributing.

Call to action description:

Save Slice : This will save the slice in the catalog and enable you to create tests more efficiently. Add Test to a test suite : This will add a test to a test suite to check if this slice performs better or worse than the rest of the dataset. Get similar examples : This will filter this debugging session to show examples from this slice only.

Requirements:

Numerical and Categorical features only

class giskard.push.PerturbationPush(value, feature, transformation_info: TransformationInfo)[source]#

Recommend actions for feature that perturb the prediction.

Description:

Tag features that when perturbed, change the prediction. This may indicate that the model is sensitive to this feature.

Triggering event:

When we switch examples and the prediction changes when we perturb a feature. We mark a feature as sensitive by applying supported perturbations to each feature in the dataset. For numerical columns, we add/subtract values based on mean absolute deviation. For text columns, we apply predefined text transformations.

Call to action description:

Add to test suite : This will add a test to a test suite to check if a perturbation on this feature changes the prediction above a threshold.

Requirements:

Need Ground truth label Numerical and Text features only

Automated model insights catalog#

giskard.push.push_test_catalog.catalog.test_diff_rmse_push(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Decreasing) GiskardTestMethod[source]#

Test for difference in RMSE between a slice and full dataset.

Checks if the RMSE on a sliced subset of the data is significantly different from the full dataset based on a threshold and direction.

Can be used with pushes to test if problematic slices have worse RMSE.

Parameters:
  • model (BaseModel) – Model to test

  • dataset (Dataset) – Full dataset

  • slicing_function (Optional[SlicingFunction], optional) – Function to slice dataset, by default None

  • threshold (float, optional) – Allowed RMSE difference, by default 0.1

  • direction (Direction, optional) – Whether slice RMSE should increase or decrease, by default Direction.Decreasing

Returns:

test result with pass/fail, slice sizes, RMSE diff, debug dataset

Return type:

TestResult

giskard.push.push_test_catalog.catalog.test_diff_f1_push(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = -0.1, direction: SuiteInput | Direction | None = Direction.Increasing) GiskardTestMethod[source]#

Test for difference in F1 score between a slice and full dataset.

Checks if the F1 score on a sliced subset of the data is significantly different from the full dataset based on a threshold and direction.

Can be used with pushes to test if problematic slices have worse F1 score.

Parameters:
  • model (BaseModel) – Model to test

  • dataset (Dataset) – Full dataset

  • slicing_function (Optional[SlicingFunction], optional) – Function to slice dataset, by default None

  • threshold (float, optional) – Allowed F1 score difference, by default -0.1

  • direction (Direction, optional) – Whether slice F1 should increase or decrease, by default Direction.Increasing

Returns:

test result with pass/fail, slice sizes, F1 diff, debug dataset

Return type:

TestResult

giskard.push.push_test_catalog.catalog.if_underconfidence_rate_decrease(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, rate: SuiteInput | float | None = None) GiskardTestMethod[source]#

Test if underconfidence rate decreases for model on dataset.

Checks if the underconfidence rate for the model on the provided dataset is lower than the specified rate.

Parameters:
  • model (BaseModel) – Model to test

  • dataset (Dataset) – Dataset to test on

  • rate (float) – Target underconfidence rate to decrease from

Returns:

Test result with pass/fail and underconfidence rate metric

Return type:

TestResult

giskard.push.push_test_catalog.catalog.if_overconfidence_rate_decrease(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, rate: SuiteInput | float | None = None) GiskardTestMethod[source]#

Test if overconfidence rate decreases for model on dataset.

Checks if the overconfidence rate for the model on the provided dataset is lower than the specified rate.

Parameters:
  • model (BaseModel) – Model to test

  • dataset (Dataset) – Dataset to test on

  • rate (float) – Target overconfidence rate to decrease from

Returns:

Test result with pass/fail and overconfidence rate metric

Return type:

TestResult

giskard.push.push_test_catalog.catalog.correct_example(model: SuiteInput | BaseModel | None = None, saved_example: SuiteInput | Dataset | None = None, training_label: SuiteInput | Any | None = None) GiskardTestMethod[source]#

Test if model correctly predicts example.

Checks if the model’s prediction on the saved example matches the provided training label ground truth.

Parameters:
  • model (BaseModel) – Model to test

  • saved_example (Dataset) – Example dataset

  • training_label (Any) – Ground truth label

Returns:

Test result with pass/fail

Return type:

TestResult

giskard.push.push_test_catalog.catalog.increase_probability(model: SuiteInput | BaseModel | None = None, saved_example: SuiteInput | Dataset | None = None, training_label: SuiteInput | Any | None = None, training_label_proba: SuiteInput | Any | None = None) GiskardTestMethod[source]#

Test if model probability for ground truth label increases.

Checks if the predicted probability for the training label increases compared to the original training probability.

Parameters:
  • model (BaseModel) – Model to test

  • saved_example (Dataset) – Example dataset

  • training_label (Any) – Ground truth label

  • training_label_proba (Any) – Original probability

Returns:

Test result with pass/fail and probability metric

Return type:

TestResult

giskard.push.push_test_catalog.catalog.one_sample_overconfidence_test(model: SuiteInput | BaseModel | None = None, saved_example: SuiteInput | Dataset | None = None) GiskardTestMethod[source]#

One-sample overconfidence test for example.

Checks if the overconfidence rate for the model on the provided example is below a threshold.

Parameters:
  • model (BaseModel) – Model to test

  • saved_example (Dataset) – Example dataset

Returns:

Test result with pass/fail and overconfidence rate metric

Return type:

TestResult

giskard.push.push_test_catalog.catalog.one_sample_underconfidence_test(model: SuiteInput | BaseModel | None = None, saved_example: SuiteInput | Dataset | None = None) GiskardTestMethod[source]#

One-sample underconfidence test for example.

Checks if the underconfidence rate for the model on the provided example is below a threshold.

Parameters:
  • model (BaseModel) – Model to test

  • saved_example (Dataset) – Example dataset

Returns:

Test result with pass/fail and underconfidence rate metric

Return type:

TestResult

giskard.push.push_test_catalog.catalog.test_metamorphic_invariance_with_mad(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, value_added: SuiteInput | float | None = None) GiskardTestMethod[source]#

Metamorphic test for numerical invariance with mean absolute deviation.

Checks if adding a value to a numerical feature keeps the prediction approximately the same using mean absolute deviation.

Parameters:
  • model (BaseModel) – Model to test

  • dataset (Dataset) – Dataset to test

  • column_name (str) – Name of numerical feature column

  • value_added (float) – Value to add to column

Returns:

Test result with pass/fail

Return type:

TestResult