Drift tests#

giskard.testing.test_drift_psi(actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.2, max_categories: SuiteInput | int | None = 20, psi_contribution_percent: SuiteInput | float | None = 0.2) GiskardTestMethod[source]#

Test categorical drift by PSI score.

It tests that the PSI score between the actual and reference datasets is below the threshold for a given categorical feature.

For example, the test is passed when the PSI score of gender between reference and actual sets is below threshold 0.2.

Parameters:
  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • column_name (str) – Name of column with categorical feature

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets.

  • threshold (float) – Threshold value for PSI. Default value = 0.2.

  • max_categories (int) – The maximum categories to compute the PSI score.

  • psi_contribution_percent (float) – The ratio between the PSI score of a given category over the total PSI score of the categorical variable. If there is a drift, the test provides all the categories that have a PSI contribution over than this ratio.

  • debug (bool) – If True and the test fails, a dataset will be provided containing the actual_dataset rows with the categories that have drifted the most (more than psi_contribution_percent of the total PSI score). Default is False.

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_drift_chi_square(actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.05, max_categories: SuiteInput | int | None = 20, chi_square_contribution_percent: SuiteInput | float | None = 0.2) GiskardTestMethod[source]#

Tests drift by chi-squared.

The test checks if the p-value of the chi-square test between the actual and reference datasets is above a threshold for a given categorical feature.

Parameters:
  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • column_name (str) – Name of column with categorical feature

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • threshold (float) – Threshold for p-value of chi-square. Default is 0.05.

  • max_categories (int) – The maximum categories to compute the chi-square.

  • chi_square_contribution_percent (float) – the ratio between the Chi-Square value of a given category over the total chi-square value of the categorical variable. If there is a drift, the test provides all the categories that have a PSI contribution over than this ratio.

  • debug (bool) – If True and the test fails, a dataset will be provided containing the actual_dataset rows with the categories that have drifted the most (more than chi_square_contribution_percent of the total chi squared score).

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_drift_ks(actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.05) GiskardTestMethod[source]#

Test drift with a Kolmogorov-Smirnov test.

Test if the p-value of the Kolmogorov-Smirnov test between the actual and reference datasets is above a threshold for a given numerical feature.

For example, if the threshold is set to 0.05, the test is passed when the p-value of the KS test of the numerical variable between the actual and reference datasets is higher than 0.05. It means that the null hypothesis (no drift) cannot be rejected at 5% confidence level and that we cannot assume drift for this variable.

Parameters:
  • actual_dataset (Dataset) – Actual dataset used to compute the test.

  • reference_dataset (Dataset) – Reference dataset used to compute the test.

  • column_name (str) – Name of column with numerical feature.

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets.

  • threshold (float) – Threshold for p-value of Kolmogorov-Smirnov test. Default is = 0.05.

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_drift_earth_movers_distance(actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.2) GiskardTestMethod[source]#

Test drift by earth mover’s distance.

Test if the earth movers distance between the actual and reference datasets is below a threshold for a given numerical feature.

For example, if the threshold is set to 0.1, the test is passed when the earth movers distance of the numerical variable between the actual and reference datasets is lower than 0.1. It means that we cannot assume drift for this variable.

Parameters:
  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • column_name (str) – Name of column with numerical feature

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • threshold (float) – Threshold for earth mover’s distance. Default is 0.2.

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_drift_prediction_psi(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, max_categories: SuiteInput | int | None = 10, threshold: SuiteInput | float | None = 0.2, psi_contribution_percent: SuiteInput | float | None = 0.2) GiskardTestMethod[source]#

Tests drift of predictions by PSI score.

Test if the PSI score between the reference and actual datasets is below the threshold for the classification labels predictions.

Parameters:
  • model (BaseModel) – Model used to compute the test

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • max_categories (int) – The maximum categories to compute the PSI score

  • threshold (float) – Threshold value for PSI. Default is 0.2.

  • psi_contribution_percent (float) – The ratio between the PSI score of a given category over the total PSI score of the categorical variable. If there is a drift, the test provides all the categories that have a PSI contribution over than this ratio.

  • debug (bool) – If True and the test fails, a dataset will be provided containing the actual_dataset rows with the categories that have drifted the most (more than psi_contribution_percent of the total PSI score).

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_drift_prediction_chi_square(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, max_categories: SuiteInput | int | None = 10, threshold: SuiteInput | float | None = 0.05, chi_square_contribution_percent: SuiteInput | float | None = 0.2) GiskardTestMethod[source]#

Tests drift of predictions by chi-squared test.

Test if the chi-square p-value between the reference and actual datasets is below the threshold for the classification labels predictions for a given slice.

Parameters:
  • model (BaseModel) – Model used to compute the test

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • max_categories (int) – the maximum categories to compute the PSI score.

  • threshold (float) – Threshold value of p-value of chi-square. Default is 0.05.

  • chi_square_contribution_percent (float) – the ratio between the Chi-Square value of a given category over the total chi-square value of the categorical variable. If there is a drift, the test provides all the categories that have a PSI contribution over than this ratio.

  • debug (bool) – If True and the test fails, a dataset will be provided containing the actual_dataset rows with the categories that have drifted the most (more than chi_square_contribution_percent of the total chi squared score).

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_drift_prediction_ks(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, classification_label: SuiteInput | str | None = None, threshold: SuiteInput | float | None = None) GiskardTestMethod[source]#

Tests drift of predictions by Kolmogorov-Smirnov test.

Test if the p-value of the KS test for prediction between the reference and actual datasets for a given subpopulation is above the threshold.

Example: The test is passed when the p-value of the KS test for the prediction for females between reference and actual dataset is higher than 0.05. It means that the null hypothesis cannot be rejected at 5% level and we cannot assume drift for this variable.

Parameters:
  • model (BaseModel) – Model used to compute the test

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • classification_label (Optional[str]) – One specific label value from the target column for classification model

  • threshold (Optional[float]) – Threshold for p-value of Kolmogorov-Smirnov test

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_drift_prediction_earth_movers_distance(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, classification_label: SuiteInput | str | None = None, threshold: SuiteInput | float | None = 0.2) GiskardTestMethod[source]#

Tests drift of predictions by earth mover’s distance.

Test if the Earth Mover’s Distance value between the reference and actual datasets is below the threshold for the classification labels predictions for classification model and prediction for regression models.

Examples For classification: the test is passed when the Earth Mover’s Distance value of classification labels probabilities for females between reference and actual sets is below 0.2

For regression: the test is passed when the Earth Mover’s Distance value of prediction for females between reference and actual sets is below 0.2

Parameters:
  • model (BaseModel) – Model to test

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets

  • classification_label (Optional[str]) – One specific label value from the target column for classification model

  • threshold (float) – Threshold for earth mover’s distance. Default is 0.2.

Returns:

The test result.

Return type:

TestResult