Performance testsΒΆ
- giskard.testing.test_mae(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug_percent_rows: SuiteInput | float | None = 0.3) GiskardTestMethod [source]ΒΆ
Test if the model Mean Absolute Error is lower than a threshold
Example: The test is passed when the MAE is lower than 10
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset (Default value = None)
threshold (float) β Threshold value for MAE (Default value = 1.0)
debug_percent_rows (float) β Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.
debug (bool) β If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data). (Default value = False)
- Returns:
The test result.
- Return type:
- giskard.testing.test_rmse(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug_percent_rows: SuiteInput | float | None = 0.3) GiskardTestMethod [source]ΒΆ
Test if the model RMSE is lower than a threshold
Example: The test is passed when the RMSE is lower than 10
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset (Default value = None)
threshold (float) β Threshold value for RMSE (Default value = 1.0)
debug_percent_rows (float) β Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.
debug (bool) β If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data). (Default value = False)
- Returns:
The test result.
- Return type:
- giskard.testing.test_recall(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0) GiskardTestMethod [source]ΒΆ
Test if the model Recall is higher than a threshold for a given slice
Example: The test is passed when the Recall for females is higher than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset (Default value = None)
threshold (float) β Threshold value for Recall (Default value = 1.0)
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)
- Returns:
The test result.
- Return type:
- giskard.testing.test_auc(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0) GiskardTestMethod [source]ΒΆ
Test if the model AUC performance is higher than a threshold for a given slice
Example: The test is passed when the AUC for females is higher than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset (Default value = None)
threshold (float) β Threshold value of AUC metrics (Default value = 1.0)
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)
- Returns:
actual_slices_size β Length of dataset tested
metric β The AUC performance metric
passed β TRUE if AUC metrics >= threshold
- giskard.testing.test_accuracy(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0) GiskardTestMethod [source]ΒΆ
Test if the model Accuracy is higher than a threshold for a given slice
Example: The test is passed when the Accuracy for females is higher than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset (Default value = None)
threshold (float) β Threshold value for Accuracy (Default value = 1.0)
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)
- Returns:
The test result.
- Return type:
- giskard.testing.test_precision(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0) GiskardTestMethod [source]ΒΆ
Test if the model Precision is higher than a threshold for a given slice
Example: The test is passed when the Precision for females is higher than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset (Default value = None)
threshold (float) β Threshold value for Precision (Default value = 1.0)
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)
- Returns:
The test result.
- Return type:
- giskard.testing.test_f1(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0) GiskardTestMethod [source]ΒΆ
Test if the model F1 score is higher than a defined threshold for a given slice
Example: The test is passed when F1 score for females is higher than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset (Default value = None)
threshold (float) β Threshold value for F1 Score (Default value = 1.0)
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows. (Default value = False)
- Returns:
actual_slices_size β Length of dataset tested
metric β The F1 score metric
passed β TRUE if F1 Score metrics >= threshold
- giskard.testing.test_r2(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0, debug_percent_rows: SuiteInput | float | None = 0.3) GiskardTestMethod [source]ΒΆ
Test if the model R-Squared is higher than a threshold
Example: The test is passed when the R-Squared is higher than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset (Default value = None)
threshold (float) β Threshold value for R-Squared (Default value = 1.0)
debug_percent_rows (float) β Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.
debug (bool) β If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data). (Default value = False)
- Returns:
The test result.
- Return type:
- giskard.testing.test_diff_recall(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant) GiskardTestMethod [source]ΒΆ
Test if the absolute percentage change of model Recall between two samples is lower than a threshold
Example : The test is passed when the Recall for females has a difference lower than 10% from the Accuracy for males. For example, if the Recall for males is 0.8 (dataset) and the Recall for females is 0.6 (reference_dataset) then the absolute percentage Recall change is 0.2 / 0.8 = 0.25 and the test will fail
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets (Default value = None)
threshold (float) β Threshold value for Recall difference (Default value = 0.1)
direction (Direction) β (Default value = Direction.Invariant)
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and (Default value = False)
- Returns:
The test result.
- Return type:
- giskard.testing.test_diff_accuracy(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant) GiskardTestMethod [source]ΒΆ
Test if the absolute percentage change of model Accuracy between two samples is lower than a threshold
Example : The test is passed when the Accuracy for females has a difference lower than 10% from the Accuracy for males. For example, if the Accuracy for males is 0.8 (dataset) and the Accuracy for females is 0.6 (reference_dataset) then the absolute percentage Accuracy change is 0.2 / 0.8 = 0.25 and the test will fail
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets (Default value = None)
threshold (float) β Threshold value for Accuracy Score difference (Default value = 0.1)
direction (Direction) β (Default value = Direction.Invariant)
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and (Default value = False)
- Returns:
The test result.
- Return type:
- giskard.testing.test_diff_precision(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant) GiskardTestMethod [source]ΒΆ
Test if the absolute percentage change of model Precision between two samples is lower than a threshold
Example : The test is passed when the Precision for females has a difference lower than 10% from the Accuracy for males. For example, if the Precision for males is 0.8 (dataset) and the Precision for females is 0.6 (reference_dataset) then the absolute percentage Precision change is 0.2 / 0.8 = 0.25 and the test will fail
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets (Default value = None)
threshold (float) β Threshold value for Precision difference (Default value = 0.1)
direction (Direction) β (Default value = Direction.Invariant)
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and (Default value = False)
- Returns:
The test result.
- Return type:
- giskard.testing.test_diff_rmse(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant, debug_percent_rows: SuiteInput | float | None = 0.3) GiskardTestMethod [source]ΒΆ
Test if the absolute percentage change of model RMSE between two samples is lower than a threshold
Example: The test is passed when the RMSE for females has a difference lower than 10% from the RMSE for males. For example, if the RMSE for males is 0.8 (dataset) and the RMSE for females is 0.6 (reference_dataset) then the absolute percentage RMSE change is 0.2 / 0.8 = 0.25 and the test will fail
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets (Default value = None)
threshold (float) β Threshold value for RMSE difference (Default value = 0.1)
direction (Direction) β (Default value = Direction.Invariant)
debug_percent_rows (float) β Percentage of rows (sorted by their highest absolute error) to debug. By default 30%.
debug (bool) β If True and the test fails, a dataset will be provided containing the top debug_percent_rows of the rows with the highest absolute error (difference between prediction and data) from both actual_dataset and reference_dataset. (Default value = False)
- Returns:
The test result.
- Return type:
- giskard.testing.test_diff_f1(model: SuiteInput | BaseModel | None = None, actual_dataset: SuiteInput | Dataset | None = None, reference_dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 0.1, direction: SuiteInput | Direction | None = Direction.Invariant) GiskardTestMethod [source]ΒΆ
Test if the absolute percentage change in model F1 Score between two samples is lower than a threshold
Example : The test is passed when the F1 Score for females has a difference lower than 10% from the F1 Score for males. For example, if the F1 Score for males is 0.8 (dataset) and the F1 Score for females is 0.6 (reference_dataset) then the absolute percentage F1 Score change is 0.2 / 0.8 = 0.25 and the test will fail
- Parameters:
model (BaseModel) β Model used to compute the test
actual_dataset (Dataset) β Actual dataset used to compute the test
reference_dataset (Dataset) β Reference dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on both actual and reference datasets (Default value = None)
threshold (float) β Threshold value for F1 Score difference (Default value = 0.1)
direction (Direction) β (Default value = Direction.Invariant)
debug (bool) β If True and the test fails, a dataset will be provided containing all the incorrectly predicted rows from both actual_dataset and (Default value = False)
- Returns:
The test result.
- Return type:
- giskard.testing.test_brier(model: SuiteInput | BaseModel | None = None, dataset: SuiteInput | Dataset | None = None, slicing_function: SuiteInput | SlicingFunction | None = None, threshold: SuiteInput | float | None = 1.0) GiskardTestMethod [source]ΒΆ
Test if the model Brier score is lower than a threshold for a given slice
Example: The test is passed when the Brier score for females is lower than 0.7
- Parameters:
model (BaseModel) β Model used to compute the test
dataset (Dataset) β Actual dataset used to compute the test
slicing_function (Optional[SlicingFunction]) β Slicing function to be applied on dataset (Default value = None)
threshold (float) β Threshold value for Brier score (Default value = 1.0)
- Returns:
The test result.
- Return type: