Metamorphic tests#
- giskard.testing.test_metamorphic_invariance(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, transformation_function: SuiteInput | TransformationFunction | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.5, output_sensitivity: SuiteInput | float | None = None, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Summary: Tests if the model prediction is invariant when the feature values are perturbed
Description: - For classification: Test if the predicted classification label remains the same after feature values perturbation. For regression: Check whether the predicted output remains the same at the output_sensibility level after feature values perturbation.
The test is passed when the ratio of invariant rows is higher than the threshold
Example : The test is passed when, after switching gender from male to female, more than 50%(threshold 0.5) of males have unchanged outputs
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
transformation_function (TransformationFunction) β Function performing the perturbations to be applied on dataset.
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
output_sensitivity (float) β Optional. The threshold for ratio between the difference between perturbed prediction and actual prediction over the actual prediction for a regression model. We consider there is a prediction difference for regression if the ratio is above the output_sensitivity of 0.1
debug (bool) β If True and the test fails, a dataset will be provided containing the non-invariant rows.
- Returns:
Length of dataset tested message:
Test result message
- metric:
The ratio of unchanged rows over the perturbed rows
- passed:
TRUE if metric > threshold
- Return type:
actual_slices_size
- giskard.testing.test_metamorphic_increasing(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, transformation_function: SuiteInput | TransformationFunction | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.5, classification_label: SuiteInput | str | None = None, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Summary: Tests if the model probability increases when the feature values are perturbed
Description: - - For classification: Test if the model probability of a given classification_label is increasing after feature values perturbation.
For regression: Test if the model prediction is increasing after feature values perturbation.
The test is passed when the percentage of rows that are increasing is higher than the threshold
- ExampleFor a credit scoring model, the test is passed when a decrease of wage by 10%,
default probability is increasing for more than 50% of people in the dataset
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
transformation_function (TransformationFunction) β Function performing the perturbations to be applied on dataset.
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
classification_label (str) β Optional.One specific label value from the target column
debug (bool) β If True and the test fails, a dataset will be provided containing the non-increasing rows.
- Returns:
Length of dataset tested message:
Test result message
- metric:
The ratio of increasing rows over the perturbed rows
- passed:
TRUE if metric > threshold
- Return type:
actual_slices_size
- giskard.testing.test_metamorphic_decreasing(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, transformation_function: SuiteInput | TransformationFunction | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.5, classification_label: SuiteInput | str | None = None, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Summary: Tests if the model probability decreases when the feature values are perturbed
Description: - - For classification: Test if the model probability of a given classification_label is decreasing after feature values perturbation.
For regression: Test if the model prediction is decreasing after feature values perturbation.
The test is passed when the percentage of rows that are decreasing is higher than the threshold
- ExampleFor a credit scoring model, the test is passed when an increase of wage by 10%,
default probability is decreasing for more than 50% of people in the dataset
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
transformation_function (TransformationFunction) β Function performing the perturbations to be applied on dataset.
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
threshold (float) β Threshold of the ratio of decreasing rows
classification_label (str) β Optional. One specific label value from the target column
debug (bool) β If True and the test fails, a dataset will be provided containing the non-decreasing rows.
- Returns:
Length of dataset tested message:
Test result message
- metric:
The ratio of decreasing rows over the perturbed rows
- passed:
TRUE if metric > threshold
- Return type:
actual_slices_size
- giskard.testing.test_metamorphic_decreasing_t_test(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, transformation_function: SuiteInput | TransformationFunction | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, critical_quantile: SuiteInput | float | None = 0.05, classification_label: SuiteInput | str | None = None, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Summary: Tests if the model probability decreases when the feature values are perturbed
Description: Calculate the t-test on TWO RELATED samples. Sample (A) is the original probability predictions while sample (B) is the probabilities after perturbation of one or more of the features. This test computes the decreasing test to study if mean(B) < mean(A) The test is passed when the p-value of the t-test between (A) and (B) is below the critical quantile
- Example: For a credit scoring model, the test is passed when a decrease of wage by 10%,
causes a statistically significant probability decrease.
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
transformation_function (TransformationFunction) β Function performing the perturbations to be applied on dataset.
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
critical_quantile (float) β Critical quantile above which the null hypothesis cannot be rejected
debug (bool) β If True and the test fails, a dataset will be provided containing the non-decreasing rows.
- Returns:
Length of dataset tested message:
Test result message
- metric:
The t-test in terms of p-value between unchanged rows over the perturbed rows
- passed:
TRUE if the p-value of the t-test between (A) and (B) is below the critical value
- Return type:
actual_slices_size
- giskard.testing.test_metamorphic_increasing_t_test(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, transformation_function: SuiteInput | TransformationFunction | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, critical_quantile: SuiteInput | float | None = 0.05, classification_label: SuiteInput | str | None = None, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Summary: Tests if the model probability increases when the feature values are perturbed
Description: Calculate the t-test on TWO RELATED samples. Sample (A) is the original probability predictions while sample (B) is the probabilities after perturbation of one or more of the features. This test computes the increasing test to study if mean(A) < mean(B) The test is passed when the p-value of the t-test between (A) and (B) is below the critical quantile
- Example: For a credit scoring model, the test is passed when a decrease of wage by 10%,
causes a statistically significant probability increase.
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
transformation_function (TransformationFunction) β Function performing the perturbations to be applied on dataset.
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
critical_quantile (float) β Critical quantile above which the null hypothesis cannot be rejected
debug (bool) β If True and the test fails, a dataset will be provided containing the non-increasing rows.
- Returns:
Length of dataset tested message:
Test result message
- metric:
The t-test in terms of p-value between unchanged rows over the perturbed rows
- passed:
TRUE if the p-value of the t-test between (A) and (B) is below the critical value
- Return type:
actual_slices_size
- giskard.testing.test_metamorphic_invariance_t_test(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, transformation_function: SuiteInput | TransformationFunction | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, window_size: SuiteInput | float | None = 0.2, critical_quantile: SuiteInput | float | None = 0.05, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Summary: Tests if the model predictions are statistically invariant when the feature values are perturbed.
Description: Calculate the t-test on TWO RELATED samples. Sample (A) is the original probability predictions while sample (B) is the probabilities after perturbation of one or more of the features. This test computes the equivalence test to show that mean(B) - window_size/2 < mean(A) < mean(B) + window_size/2 The test is passed when the following tests pass:
the p-value of the t-test between (A) and (B)+window_size/2 is below the critical quantile
the p-value of the t-test between (B)-window_size/2 and (A) is below the critical quantile
Example: The test is passed when, after switching gender from male to female, the probability distributions remains statistically invariant. In other words, the test is passed if the mean of the perturbed sample is statistically within a window determined by the user.
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
transformation_function (TransformationFunction) β Function performing the perturbations to be applied on dataset.
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
window_size (float) β Probability window in which the mean of the perturbed sample can be in
critical_quantile β Critical quantile above which the null hypothesis cannot be rejected
- Returns:
Length of dataset tested message:
Test result message
- metric:
The t-test in terms of p-value between unchanged rows over the perturbed rows
- passed:
TRUE if the p-value of the t-test between (A) and (B)+window_size/2 < critical_quantile && the p-value of the t-test between (B)-window_size/2 and (A) < critical_quantile
- Return type:
actual_slices_size
- giskard.testing.test_metamorphic_decreasing_wilcoxon(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, transformation_function: SuiteInput | TransformationFunction | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, critical_quantile: SuiteInput | float | None = 0.05, classification_label: SuiteInput | str | None = None, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Summary: Tests if the model probability decreases when the feature values are perturbed
Description: Calculate the Wilcoxon signed-rank test on TWO RELATED samples. Sample (A) is the original probability predictions while sample (B) is the probabilities after perturbation of one or more of the features. This test computes the decreasing test to study if mean(B) < mean(A) The test is passed when the p-value of the Wilcoxon signed-rank test between (A) and (B) is below the critical quantile
- Example: For a credit scoring model, the test is passed when a decrease of wage by 10%,
causes a statistically significant probability decrease.
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
transformation_function (TransformationFunction) β Function performing the perturbations to be applied on dataset.
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
critical_quantile (float) β Critical quantile above which the null hypothesis cannot be rejected
debug (bool) β If True and the test fails, a dataset will be provided containing the non-decreasing rows.
- Returns:
Length of dataset tested message:
Test result message
- metric:
The Wilcoxon signed-rank test in terms of p-value between unchanged rows over the perturbed rows
- passed:
TRUE if the p-value of the Wilcoxon signed-rank test between (A) and (B) is below the critical value
- Return type:
actual_slices_size
- giskard.testing.test_metamorphic_increasing_wilcoxon(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, transformation_function: SuiteInput | TransformationFunction | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, critical_quantile: SuiteInput | float | None = 0.05, classification_label: SuiteInput | str | None = None, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Summary: Tests if the model probability increases when the feature values are perturbed
Description: Calculate the Wilcoxon signed-rank test on TWO RELATED samples. Sample (A) is the original probability predictions while sample (B) is the probabilities after perturbation of one or more of the features. This test computes the increasing test to study if mean(A) < mean(B) The test is passed when the p-value of the Wilcoxon signed-rank test between (A) and (B) is below the critical quantile
- Example: For a credit scoring model, the test is passed when a decrease of wage by 10%,
causes a statistically significant probability increase.
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
transformation_function (TransformationFunction) β Function performing the perturbations to be applied on dataset.
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
critical_quantile (float) β Critical quantile above which the null hypothesis cannot be rejected
debug (bool) β If True and the test fails, a dataset will be provided containing the non-increasing rows.
- Returns:
Length of dataset tested message:
Test result message
- metric:
The Wilcoxon signed-rank test in terms of p-value between unchanged rows over the perturbed rows
- passed:
TRUE if the p-value of the Wilcoxon signed-rank test between (A) and (B) is below the critical value
- Return type:
actual_slices_size
- giskard.testing.test_metamorphic_invariance_wilcoxon(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, transformation_function: SuiteInput | TransformationFunction | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, window_size: SuiteInput | float | None = 0.2, critical_quantile: SuiteInput | float | None = 0.05, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Summary: Tests if the model predictions are statistically invariant when the feature values are perturbed.
Description: Calculate the Wilcoxon signed-rank test on TWO RELATED samples. Sample (A) is the original probability predictions while sample (B) is the probabilities after perturbation of one or more of the features. This test computes the equivalence test to show that mean(B) - window_size/2 < mean(A) < mean(B) + window_size/2 The test is passed when the following tests pass: - the p-value of the t-test between (A) and (B)+window_size/2 is below the critical quantile - the p-value of the t-test between (B)-window_size/2 and (A) is below the critical quantile
Example: The test is passed when, after switching gender from male to female, the probability distributions remains statistically invariant. In other words, the test is passed if the mean of the perturbed sample is statistically within a window determined by the user.
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
transformation_function (TransformationFunction) β Function performing the perturbations to be applied on dataset.
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
window_size (float) β Probability window in which the mean of the perturbed sample can be in
critical_quantile (float) β Critical quantile above which the null hypothesis cannot be rejected
debug (bool) β If True and the test fails, a dataset will be provided containing the non-invariant rows.
- Returns:
Length of dataset tested message:
Test result message
- metric:
The t-test in terms of p-value between unchanged rows over the perturbed rows
- passed:
TRUE if the p-value of the Wilcoxon signed-rank test between (A) and (B)+window_size/2 < critical_quantile && the p-value of the t-test between (B)-window_size/2 and (A) < critical_quantile
- Return type:
actual_slices_size