Drift tests#

giskard.testing.test_drift_psi(actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.2, max_categories: SuiteInput | int | None = 20, psi_contribution_percent: SuiteInput | float | None = 0.2, debug: SuiteInput | bool | None = False) GiskardTestMethod[source]#

Test if the PSI score between the actual and reference datasets is below the threshold for a given categorical feature

Example : The test is passed when the PSI score of gender between reference and actual sets is below 0.2

Parameters:
  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • column_name (str) – Name of column with categorical feature

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • threshold (float) – Threshold value for PSI

  • max_categories – the maximum categories to compute the PSI score

  • psi_contribution_percent – the ratio between the PSI score of a given category over the total PSI score of the categorical variable. If there is a drift, the test provides all the categories that have a PSI contribution over than this ratio.

  • debug (bool) – If True and the test fails, a dataset will be provided containing the actual_dataset rows with the categories that have drifted the most (more than psi_contribution_percent of the total PSI score).

Returns:

Length of rows with given categorical feature in actual slice reference_slices_size:

Length of rows with given categorical feature in reference slice

metric:

The total psi score between the actual and reference datasets

passed:

TRUE if total_psi <= threshold

Return type:

actual_slices_size

giskard.testing.test_drift_chi_square(actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.05, max_categories: SuiteInput | int | None = 20, chi_square_contribution_percent: SuiteInput | float | None = 0.2, debug: SuiteInput | bool | None = False) GiskardTestMethod[source]#

Test if the p-value of the chi square test between the actual and reference datasets is above the threshold for a given categorical feature

ExampleThe test is passed when the pvalue of the chi square test of the categorical variable between

reference and actual sets is higher than 0.05. It means that chi square test cannot be rejected at 5% level and that we cannot assume drift for this variable.

Parameters:
  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • column_name (str) – Name of column with categorical feature

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • threshold (float) – Threshold for p-value of chi-square

  • max_categories – the maximum categories to compute the chi square

  • chi_square_contribution_percent – the ratio between the Chi-Square value of a given category over the total Chi-Square value of the categorical variable. If there is a drift, the test provides all the categories that have a PSI contribution over than this ratio.

  • debug (bool) – If True and the test fails, a dataset will be provided containing the actual_dataset rows with the categories that have drifted the most (more than chi_square_contribution_percent of the total chi squared score).

Returns:

Length of rows with given categorical feature in actual slice reference_slices_size:

Length of rows with given categorical feature in reference slice

metric:

The pvalue of chi square test

passed:

TRUE if metric > threshold

Return type:

actual_slices_size

giskard.testing.test_drift_ks(actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.05) GiskardTestMethod[source]#

Test if the pvalue of the KS test between the actual and reference datasets is above the threshold for a given numerical feature

Example : The test is passed when the pvalue of the KS test of the numerical variable between the actual and reference datasets is higher than 0.05. It means that the KS test cannot be rejected at 5% level and that we cannot assume drift for this variable.

Parameters:
  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • column_name (str) – Name of column with numerical feature

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • threshold (float) – Threshold for p-value of KS test

Returns:

Length of rows with given numerical feature in actual slice reference_slices_size:

Length of rows with given numerical feature in reference slice

metric:

The pvalue of KS test

passed:

TRUE if metric >= threshold

Return type:

actual_slices_size

giskard.testing.test_drift_earth_movers_distance(actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.2) GiskardTestMethod[source]#

Test if the earth movers distance between the actual and reference datasets is below the threshold for a given numerical feature

ExampleThe test is passed when the earth movers distance of the numerical

variable between the actual and reference datasets is lower than 0.1. It means that we cannot assume drift for this variable.

Parameters:
  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • column_name (str) – Name of column with numerical feature

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • threshold (float) – Threshold for earth movers distance

Returns:

Length of rows with given numerical feature in actual slice reference_slices_size:

Length of rows with given numerical feature in reference slice

metric:

The earth movers distance

passed:

TRUE if metric <= threshold

Return type:

actual_slices_size

giskard.testing.test_drift_prediction_psi(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, max_categories: SuiteInput | int | None = 10, threshold: SuiteInput | float | None = 0.2, psi_contribution_percent: SuiteInput | float | None = 0.2, debug: SuiteInput | bool | None = False) GiskardTestMethod[source]#

Test if the PSI score between the reference and actual datasets is below the threshold for the classification labels predictions

Example : The test is passed when the PSI score of classification labels prediction for females between reference and actual sets is below 0.2

Parameters:
  • model (BaseModel) – Model used to compute the test

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • threshold (float) – Threshold value for PSI

  • max_categories – The maximum categories to compute the PSI score

  • psi_contribution_percent – The ratio between the PSI score of a given category over the total PSI score of the categorical variable. If there is a drift, the test provides all the categories that have a PSI contribution over than this ratio.

  • debug (bool) – If True and the test fails, a dataset will be provided containing the actual_dataset rows with the categories that have drifted the most (more than psi_contribution_percent of the total PSI score).

Returns:

Length of actual slice tested reference_slices_size:

Length of reference slice tested

passed:

TRUE if metric <= threshold

metric:

Total PSI value

messages:

Psi result message

Return type:

actual_slices_size

giskard.testing.test_drift_prediction_chi_square(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, max_categories: SuiteInput | int | None = 10, threshold: SuiteInput | float | None = 0.05, chi_square_contribution_percent: SuiteInput | float | None = 0.2, debug: SuiteInput | bool | None = False) GiskardTestMethod[source]#

Test if the Chi Square value between the reference and actual datasets is below the threshold for the classification labels predictions for a given slice

Example : The test is passed when the Chi Square value of classification labels prediction for females between reference and actual sets is below 0.05

Parameters:
  • model (BaseModel) – Model used to compute the test

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • threshold (float) – Threshold value of p-value of Chi-Square

  • max_categories (int) – the maximum categories to compute the PSI score

  • chi_square_contribution_percent (float) – the ratio between the Chi-Square value of a given category over the total Chi-Square value of the categorical variable. If there is a drift, the test provides all the categories that have a PSI contribution over than this ratio.

  • debug (bool) – If True and the test fails, a dataset will be provided containing the actual_dataset rows with the categories that have drifted the most (more than chi_square_contribution_percent of the total chi squared score).

Returns:

Length of actual slice tested reference_slices_size:

Length of reference slice tested

passed:

TRUE if metric > threshold

metric:

Calculated p-value of Chi_square

messages:

Message describing if prediction is drifting or not

Return type:

actual_slices_size

giskard.testing.test_drift_prediction_ks(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, classification_label: SuiteInput | str | None = None, threshold: SuiteInput | float | None = None) GiskardTestMethod[source]#
Test if the pvalue of the KS test for prediction between the reference and actual datasets for

a given subpopulation is above the threshold

ExampleThe test is passed when the pvalue of the KS test for the prediction for females

between reference and actual dataset is higher than 0.05. It means that the KS test cannot be rejected at 5% level and that we cannot assume drift for this variable.

Parameters:
  • model (BaseModel) – Model used to compute the test

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • threshold (Optional[float]) – Threshold for p-value of Kolmogorov-Smirnov test

  • classification_label (Optional[str]) – One specific label value from the target column for classification model

Returns:

Length of actual slice tested reference_slices_size:

Length of reference slice tested

passed:

TRUE if metric >= threshold

metric:

The calculated p-value Kolmogorov-Smirnov test

messages:

Kolmogorov-Smirnov result message

Return type:

actual_slices_size

giskard.testing.test_drift_prediction_earth_movers_distance(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, classification_label: SuiteInput | str | None = None, threshold: SuiteInput | float | None = 0.2) GiskardTestMethod[source]#

Test if the Earth Mover’s Distance value between the reference and actual datasets is below the threshold for the classification labels predictions for classification model and prediction for regression models

Example : Classification : The test is passed when the Earth Mover’s Distance value of classification labels probabilities for females between reference and actual sets is below 0.2

Regression : The test is passed when the Earth Mover’s Distance value of prediction for females between reference and actual sets is below 0.2

Parameters:
  • model (BaseModel) – uploaded model

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • classification_label (Optional[str]) – one specific label value from the target column for classification model

  • threshold (float) – threshold for earth mover’s distance

Returns:

TRUE if metric <= threshold metric:

Earth Mover’s Distance value

Return type:

passed