Performance tests¶

Test if the model Mean Absolute Error is lower than a threshold

Example: The test is passed when the MAE is lower than 10

Parameters:

model (BaseModel) – Model used to compute the test
dataset (Dataset) – Dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)
threshold (float) – Threshold value for MAE (Default value = 1.0)
debug_percent_rows (float) – Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.
debug (bool) – If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data). (Default value = False)

Returns:

The test result.

Return type:

TestResult

Test if the model RMSE is lower than a threshold

Example: The test is passed when the RMSE is lower than 10

Parameters:

model (BaseModel) – Model used to compute the test
dataset (Dataset) – Dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)
threshold (float) – Threshold value for RMSE (Default value = 1.0)
debug_percent_rows (float) – Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.
debug (bool) – If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data). (Default value = False)

Returns:

The test result.

Return type:

TestResult

Test if the model Recall is higher than a threshold for a given slice

Example: The test is passed when the Recall for females is higher than 0.7

Parameters:

model (BaseModel) – Model used to compute the test
dataset (Dataset) – Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)
threshold (float) – Threshold value for Recall (Default value = 1.0)
debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)

Returns:

The test result.

Return type:

TestResult

Test if the model AUC performance is higher than a threshold for a given slice

Example: The test is passed when the AUC for females is higher than 0.7

Parameters:

model (BaseModel) – Model used to compute the test
dataset (Dataset) – Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)
threshold (float) – Threshold value of AUC metrics (Default value = 1.0)
debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)

Returns:

actual_slices_size – Length of dataset tested
metric – The AUC performance metric
passed – TRUE if AUC metrics >= threshold

Test if the model Accuracy is higher than a threshold for a given slice

Example: The test is passed when the Accuracy for females is higher than 0.7

Parameters:

model (BaseModel) – Model used to compute the test
dataset (Dataset) – Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)
threshold (float) – Threshold value for Accuracy (Default value = 1.0)
debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)

Returns:

The test result.

Return type:

TestResult

Test if the model Precision is higher than a threshold for a given slice

Example: The test is passed when the Precision for females is higher than 0.7

Parameters:

model (BaseModel) – Model used to compute the test
dataset (Dataset) – Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)
threshold (float) – Threshold value for Precision (Default value = 1.0)
debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)

Returns:

The test result.

Return type:

TestResult

Test if the model F1 score is higher than a defined threshold for a given slice

Example: The test is passed when F1 score for females is higher than 0.7

Parameters:

model (BaseModel) – Model used to compute the test
dataset (Dataset) – Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)
threshold (float) – Threshold value for F1 Score (Default value = 1.0)
debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)

Returns:

actual_slices_size – Length of dataset tested
metric – The F1 score metric
passed – TRUE if F1 Score metrics >= threshold

Test if the model R-Squared is higher than a threshold

Example: The test is passed when the R-Squared is higher than 0.7

Parameters:

model (BaseModel) – Model used to compute the test
dataset (Dataset) – Dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)
threshold (float) – Threshold value for R-Squared (Default value = 1.0)
debug_percent_rows (float) – Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.
debug (bool) – If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data). (Default value = False)

Returns:

The test result.

Return type:

TestResult

Test if the absolute percentage change of model Recall between two samples is lower than a threshold

Example : The test is passed when the Recall for females has a difference lower than 10% from the Accuracy for males. For example, if the Recall for males is 0.8 (dataset) and the Recall for females is 0.6 (reference_dataset) then the absolute percentage Recall change is 0.2 / 0.8 = 0.25 and the test will fail

Parameters:

model (BaseModel) – Model used to compute the test
actual_dataset (Dataset) – Actual dataset used to compute the test
reference_dataset (Dataset) – Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets (Default value = None)
threshold (float) – Threshold value for Recall difference (Default value = 0.1)
direction (Direction) – (Default value = Direction.Invariant)
debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and (Default value = False)

Returns:

The test result.

Return type:

TestResult

Test if the absolute percentage change of model Accuracy between two samples is lower than a threshold

Example : The test is passed when the Accuracy for females has a difference lower than 10% from the Accuracy for males. For example, if the Accuracy for males is 0.8 (dataset) and the Accuracy for females is 0.6 (reference_dataset) then the absolute percentage Accuracy change is 0.2 / 0.8 = 0.25 and the test will fail

Parameters:

model (BaseModel) – Model used to compute the test
actual_dataset (Dataset) – Actual dataset used to compute the test
reference_dataset (Dataset) – Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets (Default value = None)
threshold (float) – Threshold value for Accuracy Score difference (Default value = 0.1)
direction (Direction) – (Default value = Direction.Invariant)
debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and (Default value = False)

Returns:

The test result.

Return type:

TestResult

Test if the absolute percentage change of model Precision between two samples is lower than a threshold

Example : The test is passed when the Precision for females has a difference lower than 10% from the Accuracy for males. For example, if the Precision for males is 0.8 (dataset) and the Precision for females is 0.6 (reference_dataset) then the absolute percentage Precision change is 0.2 / 0.8 = 0.25 and the test will fail

Parameters:

model (BaseModel) – Model used to compute the test
actual_dataset (Dataset) – Actual dataset used to compute the test
reference_dataset (Dataset) – Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets (Default value = None)
threshold (float) – Threshold value for Precision difference (Default value = 0.1)
direction (Direction) – (Default value = Direction.Invariant)
debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and (Default value = False)

Returns:

The test result.

Return type:

TestResult

Test if the absolute percentage change of model RMSE between two samples is lower than a threshold

Example: The test is passed when the RMSE for females has a difference lower than 10% from the RMSE for males. For example, if the RMSE for males is 0.8 (dataset) and the RMSE for females is 0.6 (reference_dataset) then the absolute percentage RMSE change is 0.2 / 0.8 = 0.25 and the test will fail

Parameters:

model (BaseModel) – Model used to compute the test
actual_dataset (Dataset) – Actual dataset used to compute the test
reference_dataset (Dataset) – Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets (Default value = None)
threshold (float) – Threshold value for RMSE difference (Default value = 0.1)
direction (Direction) – (Default value = Direction.Invariant)
debug_percent_rows (float) – Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.
debug (bool) – If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data) from both actual_dataset and reference_dataset. (Default value = False)

Returns:

The test result.

Return type:

TestResult

Test if the absolute percentage change in model F1 Score between two samples is lower than a threshold

Example : The test is passed when the F1 Score for females has a difference lower than 10% from the F1 Score for males. For example, if the F1 Score for males is 0.8 (dataset) and the F1 Score for females is 0.6 (reference_dataset) then the absolute percentage F1 Score change is 0.2 / 0.8 = 0.25 and the test will fail

Parameters:

model (BaseModel) – Model used to compute the test
actual_dataset (Dataset) – Actual dataset used to compute the test
reference_dataset (Dataset) – Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on both actual and reference datasets (Default value = None)
threshold (float) – Threshold value for F1 Score difference (Default value = 0.1)
direction (Direction) – (Default value = Direction.Invariant)
debug (bool) – If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and (Default value = False)

Returns:

The test result.

Return type:

TestResult

Test if the model Brier score is lower than a threshold for a given slice

Example: The test is passed when the Brier score for females is lower than 0.7

Parameters:

model (BaseModel) – Model used to compute the test
dataset (Dataset) – Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) – Slicing function to be applied on dataset (Default value = None)
threshold (float) – Threshold value for Brier score (Default value = 1.0)

Returns:

The test result.

Return type:

TestResult