Performance tests#
- giskard.testing.test_mae(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug_percent_rows: SuiteInput | float | None = 0.3, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the model Mean Absolute Error is lower than a threshold
Example: The test is passed when the MAE is lower than 10
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
threshold (float) β Threshold value for MAE
debug_percent_rows (float) β Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.
debug (bool) β If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data).
- Returns:
Length of dataset tested reference_slices_size:
Length of reference_dataset tested
- metric:
The MAE metric
- passed:
TRUE if MAE metric <= threshold
- Return type:
actual_slices_size
- giskard.testing.test_rmse(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug_percent_rows: SuiteInput | float | None = 0.3, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the model RMSE is lower than a threshold
Example: The test is passed when the RMSE is lower than 10
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
threshold (float) β Threshold value for RMSE
debug_percent_rows (float) β Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.
debug (bool) β If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data).
- Returns:
Length of dataset tested metric:
The RMSE metric
- passed:
TRUE if RMSE metric <= threshold
- Return type:
actual_slices_size
- giskard.testing.test_recall(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the model Recall is higher than a threshold for a given slice
Example: The test is passed when the Recall for females is higher than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
threshold (float) β Threshold value for Recall
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows.
- Returns:
Length of dataset tested metric:
The Recall metric
- passed:
TRUE if Recall metric >= threshold
- Return type:
actual_slices_size
- giskard.testing.test_auc(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the model AUC performance is higher than a threshold for a given slice
Example : The test is passed when the AUC for females is higher than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
threshold (float) β Threshold value of AUC metrics
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows.
- Returns:
Length of dataset tested metric:
The AUC performance metric
- passed:
TRUE if AUC metrics >= threshold
- Return type:
actual_slices_size
- giskard.testing.test_accuracy(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the model Accuracy is higher than a threshold for a given slice
Example: The test is passed when the Accuracy for females is higher than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
threshold (float) β Threshold value for Accuracy
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows.
- Returns:
Length of dataset tested metric:
The Accuracy metric
- passed:
TRUE if Accuracy metrics >= threshold
- Return type:
actual_slices_size
- giskard.testing.test_precision(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the model Precision is higher than a threshold for a given slice
Example: The test is passed when the Precision for females is higher than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
threshold (float) β Threshold value for Precision
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows.
- Returns:
Length of dataset tested metric:
The Precision metric
- passed:
TRUE if Precision metrics >= threshold
- Return type:
actual_slices_size
- giskard.testing.test_f1(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the model F1 score is higher than a defined threshold for a given slice
Example: The test is passed when F1 score for females is higher than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
threshold (float) β Threshold value for F1 Score
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows.
- Returns:
Length of dataset tested metric:
The F1 score metric
- passed:
TRUE if F1 Score metrics >= threshold
- Return type:
actual_slices_size
- giskard.testing.test_r2(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug_percent_rows: SuiteInput | float | None = 0.3, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the model R-Squared is higher than a threshold
Example: The test is passed when the R-Squared is higher than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset
threshold (float) β Threshold value for R-Squared
debug_percent_rows (float) β Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.
debug (bool) β If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data).
- Returns:
Length of dataset tested metric:
The R-Squared metric
- passed:
TRUE if R-Squared metric >= threshold
- Return type:
actual_slices_size
- giskard.testing.test_diff_recall(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the absolute percentage change of model Recall between two samples is lower than a threshold
Example : The test is passed when the Recall for females has a difference lower than 10% from the Accuracy for males. For example, if the Recall for males is 0.8 (dataset) and the Recall for females is 0.6 (reference_dataset) then the absolute percentage Recall change is 0.2 / 0.8 = 0.25 and the test will fail
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
threshold (float) β Threshold value for Recall difference
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and reference_dataset
- Returns:
Length of dataset tested reference_slices_size:
Length of reference_dataset tested
- metric:
The Recall difference metric
- passed:
TRUE if Recall difference < threshold
- Return type:
actual_slices_size
- giskard.testing.test_diff_accuracy(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the absolute percentage change of model Accuracy between two samples is lower than a threshold
Example : The test is passed when the Accuracy for females has a difference lower than 10% from the Accuracy for males. For example, if the Accuracy for males is 0.8 (dataset) and the Accuracy for females is 0.6 (reference_dataset) then the absolute percentage Accuracy change is 0.2 / 0.8 = 0.25 and the test will fail
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
threshold (float) β Threshold value for Accuracy Score difference
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and reference_dataset
- Returns:
Length of dataset tested reference_slices_size:
Length of reference_dataset tested
- metric:
The Accuracy difference metric
- passed:
TRUE if Accuracy difference < threshold
- Return type:
actual_slices_size
- giskard.testing.test_diff_precision(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the absolute percentage change of model Precision between two samples is lower than a threshold
Example : The test is passed when the Precision for females has a difference lower than 10% from the Accuracy for males. For example, if the Precision for males is 0.8 (dataset) and the Precision for females is 0.6 (reference_dataset) then the absolute percentage Precision change is 0.2 / 0.8 = 0.25 and the test will fail
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
threshold (float) β Threshold value for Precision difference
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and reference_dataset
- Returns:
Length of dataset tested reference_slices_size:
Length of reference_dataset tested
- metric:
The Precision difference metric
- passed:
TRUE if Precision difference < threshold
- Return type:
actual_slices_size
- giskard.testing.test_diff_rmse(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant, debug_percent_rows: SuiteInput | float | None = 0.3, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the absolute percentage change of model RMSE between two samples is lower than a threshold
Example : The test is passed when the RMSE for females has a difference lower than 10% from the RMSE for males. For example, if the RMSE for males is 0.8 (dataset) and the RMSE for females is 0.6 (reference_dataset) then the absolute percentage RMSE change is 0.2 / 0.8 = 0.25 and the test will fail
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
threshold (float) β Threshold value for RMSE difference
debug_percent_rows (float) β Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.
debug (bool) β If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data) from both actual_dataset and reference_dataset.
- Returns:
Length of dataset tested reference_slices_size:
Length of reference_dataset tested
- metric:
The RMSE difference metric
- passed:
TRUE if RMSE difference < threshold
- Return type:
actual_slices_size
- giskard.testing.test_diff_f1(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant, debug: SuiteInput | bool | None = False) GiskardTestMethod [source]#
Test if the absolute percentage change in model F1 Score between two samples is lower than a threshold
Example : The test is passed when the F1 Score for females has a difference lower than 10% from the F1 Score for males. For example, if the F1 Score for males is 0.8 (dataset) and the F1 Score for females is 0.6 (reference_dataset) then the absolute percentage F1 Score change is 0.2 / 0.8 = 0.25 and the test will fail
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets
threshold (float) β Threshold value for F1 Score difference
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and reference_dataset
- Returns:
Length of dataset tested reference_slices_size:
Length of reference_dataset tested
- metric:
The F1 Score difference metric
- passed:
TRUE if F1 Score difference < threshold
- Return type:
actual_slices_size