Drift tests#
- giskard.testing.test_drift_psi(actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.2, max_categories: SuiteInput | int | None = 20, psi_contribution_percent: SuiteInput | float | None = 0.2, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the PSI score between the actual and reference datasets is below the threshold for a given categorical feature
Example : The test is passed when the PSI score of gender between reference and actual sets is below 0.2
- Parameters:
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
column_name (str) β Name of column with categorical feature
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
threshold (float) β Threshold value for PSI
max_categories β the maximum categories to compute the PSI score
psi_contribution_percent β the ratio between the PSI score of a given category over the total PSI score of the categorical variable. If there is a drift, the test provides all the categories that have a PSI contribution over than this ratio.
debug (bool) β If True and the test fails, a dataset will be provided containing the actual_dataset rows with the categories that have drifted the most (more than psi_contribution_percent of the total PSI score).
- Returns:
Length of rows with given categorical feature in actual slice reference_slices_size:
Length of rows with given categorical feature in reference slice
- metric:
The total psi score between the actual and reference datasets
- passed:
TRUE if total_psi <= threshold
- Return type:
actual_slices_size
- giskard.testing.test_drift_chi_square(actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.05, max_categories: SuiteInput | int | None = 20, chi_square_contribution_percent: SuiteInput | float | None = 0.2, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the p-value of the chi square test between the actual and reference datasets is above the threshold for a given categorical feature
- ExampleThe test is passed when the pvalue of the chi square test of the categorical variable between
reference and actual sets is higher than 0.05. It means that chi square test cannot be rejected at 5% level and that we cannot assume drift for this variable.
- Parameters:
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
column_name (str) β Name of column with categorical feature
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
threshold (float) β Threshold for p-value of chi-square
max_categories β the maximum categories to compute the chi square
chi_square_contribution_percent β the ratio between the Chi-Square value of a given category over the total Chi-Square value of the categorical variable. If there is a drift, the test provides all the categories that have a PSI contribution over than this ratio.
debug (bool) β If True and the test fails, a dataset will be provided containing the actual_dataset rows with the categories that have drifted the most (more than chi_square_contribution_percent of the total chi squared score).
- Returns:
Length of rows with given categorical feature in actual slice reference_slices_size:
Length of rows with given categorical feature in reference slice
- metric:
The pvalue of chi square test
- passed:
TRUE if metric > threshold
- Return type:
actual_slices_size
- giskard.testing.test_drift_ks(actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.05) GiskardTestMethod [source]#
Test if the pvalue of the KS test between the actual and reference datasets is above the threshold for a given numerical feature
Example : The test is passed when the pvalue of the KS test of the numerical variable between the actual and reference datasets is higher than 0.05. It means that the KS test cannot be rejected at 5% level and that we cannot assume drift for this variable.
- Parameters:
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
column_name (str) β Name of column with numerical feature
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
threshold (float) β Threshold for p-value of KS test
- Returns:
Length of rows with given numerical feature in actual slice reference_slices_size:
Length of rows with given numerical feature in reference slice
- metric:
The pvalue of KS test
- passed:
TRUE if metric >= threshold
- Return type:
actual_slices_size
- giskard.testing.test_drift_earth_movers_distance(actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.2) GiskardTestMethod [source]#
Test if the earth movers distance between the actual and reference datasets is below the threshold for a given numerical feature
- ExampleThe test is passed when the earth movers distance of the numerical
variable between the actual and reference datasets is lower than 0.1. It means that we cannot assume drift for this variable.
- Parameters:
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
column_name (str) β Name of column with numerical feature
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
threshold (float) β Threshold for earth movers distance
- Returns:
Length of rows with given numerical feature in actual slice reference_slices_size:
Length of rows with given numerical feature in reference slice
- metric:
The earth movers distance
- passed:
TRUE if metric <= threshold
- Return type:
actual_slices_size
- giskard.testing.test_drift_prediction_psi(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, max_categories: SuiteInput | int | None = 10, threshold: SuiteInput | float | None = 0.2, psi_contribution_percent: SuiteInput | float | None = 0.2, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the PSI score between the reference and actual datasets is below the threshold for the classification labels predictions
Example : The test is passed when the PSI score of classification labels prediction for females between reference and actual sets is below 0.2
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
threshold (float) β Threshold value for PSI
max_categories β The maximum categories to compute the PSI score
psi_contribution_percent β The ratio between the PSI score of a given category over the total PSI score of the categorical variable. If there is a drift, the test provides all the categories that have a PSI contribution over than this ratio.
debug (bool) β If True and the test fails, a dataset will be provided containing the actual_dataset rows with the categories that have drifted the most (more than psi_contribution_percent of the total PSI score).
- Returns:
Length of actual slice tested reference_slices_size:
Length of reference slice tested
- passed:
TRUE if metric <= threshold
- metric:
Total PSI value
- messages:
Psi result message
- Return type:
actual_slices_size
- giskard.testing.test_drift_prediction_chi_square(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, max_categories: SuiteInput | int | None = 10, threshold: SuiteInput | float | None = 0.05, chi_square_contribution_percent: SuiteInput | float | None = 0.2, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the Chi Square value between the reference and actual datasets is below the threshold for the classification labels predictions for a given slice
Example : The test is passed when the Chi Square value of classification labels prediction for females between reference and actual sets is below 0.05
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
threshold (float) β Threshold value of p-value of Chi-Square
max_categories (int) β the maximum categories to compute the PSI score
chi_square_contribution_percent (float) β the ratio between the Chi-Square value of a given category over the total Chi-Square value of the categorical variable. If there is a drift, the test provides all the categories that have a PSI contribution over than this ratio.
debug (bool) β If True and the test fails, a dataset will be provided containing the actual_dataset rows with the categories that have drifted the most (more than chi_square_contribution_percent of the total chi squared score).
- Returns:
Length of actual slice tested reference_slices_size:
Length of reference slice tested
- passed:
TRUE if metric > threshold
- metric:
Calculated p-value of Chi_square
- messages:
Message describing if prediction is drifting or not
- Return type:
actual_slices_size
- giskard.testing.test_drift_prediction_ks(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, classification_label: SuiteInput | str | None = None, threshold: SuiteInput | float | None = None) GiskardTestMethod [source]#
- Test if the pvalue of the KS test for prediction between the reference and actual datasets for
a given subpopulation is above the threshold
- ExampleThe test is passed when the pvalue of the KS test for the prediction for females
between reference and actual dataset is higher than 0.05. It means that the KS test cannot be rejected at 5% level and that we cannot assume drift for this variable.
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
threshold (Optional[float]) β Threshold for p-value of Kolmogorov-Smirnov test
classification_label (Optional[str]) β One specific label value from the target column for classification model
- Returns:
Length of actual slice tested reference_slices_size:
Length of reference slice tested
- passed:
TRUE if metric >= threshold
- metric:
The calculated p-value Kolmogorov-Smirnov test
- messages:
Kolmogorov-Smirnov result message
- Return type:
actual_slices_size
- giskard.testing.test_drift_prediction_earth_movers_distance(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, classification_label: SuiteInput | str | None = None, threshold: SuiteInput | float | None = 0.2) GiskardTestMethod [source]#
Test if the Earth Moverβs Distance value between the reference and actual datasets is below the threshold for the classification labels predictions for classification model and prediction for regression models
Example : Classification : The test is passed when the Earth Moverβs Distance value of classification labels probabilities for females between reference and actual sets is below 0.2
Regression : The test is passed when the Earth Moverβs Distance value of prediction for females between reference and actual sets is below 0.2
- Parameters:
model (BaseModel) β uploaded model
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
classification_label (Optional[str]) β one specific label value from the target column for classification model
threshold (float) β threshold for earth moverβs distance
- Returns:
TRUE if metric <= threshold metric:
Earth Moverβs Distance value
- Return type:
passed