Performance tests#

giskard.testing.test_mae(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug_percent_rows: SuiteInput | float | None = 0.3) GiskardTestMethod[source]#

Test if the model Mean Absolute Error is lower than a threshold

Example: The test is passed when the MAE is lower than 10

Parameters:
  • model (BaseModel) – Model used to compute the test

  • dataset (Dataset) – Dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)

  • threshold (float) – Threshold value for MAE (Default value = 1.0)

  • debug_percent_rows (float) – Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.

  • debug (bool) – If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data). (Default value = False)

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_rmse(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug_percent_rows: SuiteInput | float | None = 0.3) GiskardTestMethod[source]#

Test if the model RMSE is lower than a threshold

Example: The test is passed when the RMSE is lower than 10

Parameters:
  • model (BaseModel) – Model used to compute the test

  • dataset (Dataset) – Dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)

  • threshold (float) – Threshold value for RMSE (Default value = 1.0)

  • debug_percent_rows (float) – Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.

  • debug (bool) – If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data). (Default value = False)

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_recall(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0) GiskardTestMethod[source]#

Test if the model Recall is higher than a threshold for a given slice

Example: The test is passed when the Recall for females is higher than 0.7

Parameters:
  • model (BaseModel) – Model used to compute the test

  • dataset (Dataset) – Actual dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)

  • threshold (float) – Threshold value for Recall (Default value = 1.0)

  • debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_auc(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0) GiskardTestMethod[source]#

Test if the model AUC performance is higher than a threshold for a given slice

Example: The test is passed when the AUC for females is higher than 0.7

Parameters:
  • model (BaseModel) – Model used to compute the test

  • dataset (Dataset) – Actual dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)

  • threshold (float) – Threshold value of AUC metrics (Default value = 1.0)

  • debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)

Returns:

  • actual_slices_size – Length of dataset tested

  • metric – The AUC performance metric

  • passed – TRUE if AUC metrics >= threshold

giskard.testing.test_accuracy(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0) GiskardTestMethod[source]#

Test if the model Accuracy is higher than a threshold for a given slice

Example: The test is passed when the Accuracy for females is higher than 0.7

Parameters:
  • model (BaseModel) – Model used to compute the test

  • dataset (Dataset) – Actual dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)

  • threshold (float) – Threshold value for Accuracy (Default value = 1.0)

  • debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_precision(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0) GiskardTestMethod[source]#

Test if the model Precision is higher than a threshold for a given slice

Example: The test is passed when the Precision for females is higher than 0.7

Parameters:
  • model (BaseModel) – Model used to compute the test

  • dataset (Dataset) – Actual dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)

  • threshold (float) – Threshold value for Precision (Default value = 1.0)

  • debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_f1(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0) GiskardTestMethod[source]#

Test if the model F1 score is higher than a defined threshold for a given slice

Example: The test is passed when F1 score for females is higher than 0.7

Parameters:
  • model (BaseModel) – Model used to compute the test

  • dataset (Dataset) – Actual dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)

  • threshold (float) – Threshold value for F1 Score (Default value = 1.0)

  • debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)

Returns:

  • actual_slices_size – Length of dataset tested

  • metric – The F1 score metric

  • passed – TRUE if F1 Score metrics >= threshold

giskard.testing.test_r2(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug_percent_rows: SuiteInput | float | None = 0.3) GiskardTestMethod[source]#

Test if the model R-Squared is higher than a threshold

Example: The test is passed when the R-Squared is higher than 0.7

Parameters:
  • model (BaseModel) – Model used to compute the test

  • dataset (Dataset) – Dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)

  • threshold (float) – Threshold value for R-Squared (Default value = 1.0)

  • debug_percent_rows (float) – Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.

  • debug (bool) – If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data). (Default value = False)

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_diff_recall(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant) GiskardTestMethod[source]#

Test if the absolute percentage change of model Recall between two samples is lower than a threshold

Example : The test is passed when the Recall for females has a difference lower than 10% from the Accuracy for males. For example, if the Recall for males is 0.8 (dataset) and the Recall for females is 0.6 (reference_dataset) then the absolute percentage Recall change is 0.2 / 0.8 = 0.25 and the test will fail

Parameters:
  • model (BaseModel) – Model used to compute the test

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Actual dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets (Default value = None)

  • threshold (float) – Threshold value for Recall difference (Default value = 0.1)

  • direction (Direction) – (Default value = Direction.Invariant)

  • debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and (Default value = False)

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_diff_accuracy(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant) GiskardTestMethod[source]#

Test if the absolute percentage change of model Accuracy between two samples is lower than a threshold

Example : The test is passed when the Accuracy for females has a difference lower than 10% from the Accuracy for males. For example, if the Accuracy for males is 0.8 (dataset) and the Accuracy for females is 0.6 (reference_dataset) then the absolute percentage Accuracy change is 0.2 / 0.8 = 0.25 and the test will fail

Parameters:
  • model (BaseModel) – Model used to compute the test

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets (Default value = None)

  • threshold (float) – Threshold value for Accuracy Score difference (Default value = 0.1)

  • direction (Direction) – (Default value = Direction.Invariant)

  • debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and (Default value = False)

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_diff_precision(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant) GiskardTestMethod[source]#

Test if the absolute percentage change of model Precision between two samples is lower than a threshold

Example : The test is passed when the Precision for females has a difference lower than 10% from the Accuracy for males. For example, if the Precision for males is 0.8 (dataset) and the Precision for females is 0.6 (reference_dataset) then the absolute percentage Precision change is 0.2 / 0.8 = 0.25 and the test will fail

Parameters:
  • model (BaseModel) – Model used to compute the test

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets (Default value = None)

  • threshold (float) – Threshold value for Precision difference (Default value = 0.1)

  • direction (Direction) – (Default value = Direction.Invariant)

  • debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and (Default value = False)

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_diff_rmse(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant, debug_percent_rows: SuiteInput | float | None = 0.3) GiskardTestMethod[source]#

Test if the absolute percentage change of model RMSE between two samples is lower than a threshold

Example: The test is passed when the RMSE for females has a difference lower than 10% from the RMSE for males. For example, if the RMSE for males is 0.8 (dataset) and the RMSE for females is 0.6 (reference_dataset) then the absolute percentage RMSE change is 0.2 / 0.8 = 0.25 and the test will fail

Parameters:
  • model (BaseModel) – Model used to compute the test

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets (Default value = None)

  • threshold (float) – Threshold value for RMSE difference (Default value = 0.1)

  • direction (Direction) – (Default value = Direction.Invariant)

  • debug_percent_rows (float) – Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.

  • debug (bool) – If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data) from both actual_dataset and reference_dataset. (Default value = False)

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_diff_f1(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant) GiskardTestMethod[source]#

Test if the absolute percentage change in model F1 Score between two samples is lower than a threshold

Example : The test is passed when the F1 Score for females has a difference lower than 10% from the F1 Score for males. For example, if the F1 Score for males is 0.8 (dataset) and the F1 Score for females is 0.6 (reference_dataset) then the absolute percentage F1 Score change is 0.2 / 0.8 = 0.25 and the test will fail

Parameters:
  • model (BaseModel) – Model used to compute the test

  • actual_dataset (Dataset) – Actual dataset used to compute the test

  • reference_dataset (Dataset) – Reference dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets (Default value = None)

  • threshold (float) – Threshold value for F1 Score difference (Default value = 0.1)

  • direction (Direction) – (Default value = Direction.Invariant)

  • debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and (Default value = False)

Returns:

The test result.

Return type:

TestResult

giskard.testing.test_brier(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0) GiskardTestMethod[source]#

Test if the model Brier score is lower than a threshold for a given slice

Example: The test is passed when the Brier score for females is lower than 0.7

Parameters:
  • model (BaseModel) – Model used to compute the test

  • dataset (Dataset) – Actual dataset used to compute the test

  • slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)

  • threshold (float) – Threshold value for Brier score (Default value = 1.0)

Returns:

The test result.

Return type:

TestResult