Data quality testsΒΆ
- giskard.testing.test_data_uniqueness(dataset: SuiteInput | Dataset | None = None, column: SuiteInput | str | None = None, threshold: SuiteInput | float | None = 0.8) GiskardTestMethod [source]ΒΆ
Test for checking the uniqueness of data in a column.
- Parameters:
dataset (Dataset) β The dataset to test.
column (str) β The column to check for uniqueness.
threshold (float, optional) β The minimum uniqueness ratio for the test to pass., by default 0.8
- Returns:
The result of the test.
- Return type:
- giskard.testing.test_data_completeness(dataset: SuiteInput | Dataset | None = None, column_name: SuiteInput | str | None = None, threshold: SuiteInput | float | None = None) GiskardTestMethod [source]ΒΆ
Test for checking the completeness of data in a dataset.
- Parameters:
dataset (Dataset) β The dataset to test.
column_name (str) β The name of the column to test.
threshold (float) β The minimum completeness ratio for the test to pass.
- Returns:
A TestResult object indicating whether the test passed and the completeness ratio.
- Return type:
- giskard.testing.test_valid_range(dataset: SuiteInput | Dataset | None = None, column: SuiteInput | str | None = None, min_value: SuiteInput | float | None = None, max_value: SuiteInput | float | None = None) GiskardTestMethod [source]ΒΆ
Test for checking if data in a column falls within a specified range.
- Parameters:
dataset (Dataset) β The dataset to test
column (str) β The column to check
min_value (float, optional) β The minimum valid value, by default None
max_value (float, optional) β The maximum valid value, by default None
- Returns:
The result of the test
- Return type:
- giskard.testing.test_valid_values(dataset: SuiteInput | Dataset | None = None, column: SuiteInput | str | None = None, valid_values: SuiteInput | List | None = None) GiskardTestMethod [source]ΒΆ
Test for checking if data in a column is in a set of valid values.
- Parameters:
dataset (Dataset) β The dataset to test
column (str) β The column to check
valid_values (Optional[List], optional) β A list of valid values, by default None
- Returns:
The result of the test
- Return type:
- giskard.testing.test_data_correlation(dataset: SuiteInput | Dataset | None = None, column1: SuiteInput | str | None = None, column2: SuiteInput | str | None = None, should_correlate: SuiteInput | bool | None = True, correlation_threshold: SuiteInput | float | None = 0.0) GiskardTestMethod [source]ΒΆ
Test for analyzing correlations between two specific features.
- Parameters:
dataset (Dataset) β The dataset to test
column1 (str, optional) β The first column to check, by default None
column2 (str, optional) β The second column to check, by default None
should_correlate (bool, optional) β Whether the two columns should correlate, by default True
correlation_threshold (float, optional) β The minimum absolute correlation that is considered significant, by default 0.0
- Returns:
The result of the test, containing the correlation between the two columns
- Return type:
- giskard.testing.test_outlier_value(dataset: SuiteInput | Dataset | None = None, column: SuiteInput | str | None = None, eps: SuiteInput | float | None = 0.5, min_samples: SuiteInput | int | None = 5) GiskardTestMethod [source]ΒΆ
Test for identifying outliers or anomalies in a column of the dataset using DBSCAN.
- Parameters:
dataset (Dataset) β The dataset to test
column (str) β The column to check for anomalies
eps (float, optional) β The maximum distance between two samples for one to be considered as in the neighborhood of the other, by default 0.5
min_samples (int, optional) β The number of samples in a neighborhood for a point to be considered as a core point, by default 5
- Returns:
The result of the test, containing the indices of the anomalies
- Return type:
- giskard.testing.test_foreign_constraint(dataset: SuiteInput | Dataset | None = None, column: SuiteInput | str | None = None, target_dataset: SuiteInput | Dataset | None = None, target_column: SuiteInput | str | None = None, threshold: SuiteInput | float | None = 0.0) GiskardTestMethod [source]ΒΆ
Ensure that all data in a column of one dataset are present in a column of another dataset.
- Parameters:
dataset (Dataset) β The dataset to check
column (str) β The column in the dataset to check
target_dataset (Dataset) β The dataset to compare against
target_column (str) β The column in the target dataset to compare against
threshold (float, optional) β The maximum allowed ratio of missing values, by default 0.0
- Returns:
The result of the test, indicating whether the test passed and the ratio of missing values
- Return type:
- giskard.testing.test_label_consistency(dataset: SuiteInput | Dataset | None = None, label_column: SuiteInput | str | None = None) GiskardTestMethod [source]ΒΆ
Test for checking the consistency of datatype across each label throughout dataset.
- Parameters:
dataset (Dataset) β The dataset to test
label_column (str) β The column containing the labels
- Returns:
The result of the test
- Return type:
- giskard.testing.test_mislabeling(dataset: SuiteInput | Dataset | None = None, labelled_column: SuiteInput | str | None = None, reference_columns: SuiteInput | Iterable[str] | None = None) GiskardTestMethod [source]ΒΆ
Test for detecting mislabelled data
- Parameters:
dataset (Dataset) β The dataset to test
labelled_column (str) β The column containing the labels
reference_columns (Iterable[str]) β The columns containing the data to check for consistency
- Returns:
The result of the test, containing the indices of the mislabelled data
- Return type:
- giskard.testing.test_feature_importance(dataset: SuiteInput | Dataset | None = None, feature_columns: SuiteInput | Iterable[str] | None = None, target_column: SuiteInput | str | None = None, importance_threshold: SuiteInput | float | None = 0) GiskardTestMethod [source]ΒΆ
Test for analyzing the importance of features in a classification problem
- Parameters:
dataset (Dataset) β The dataset to test
feature_columns (Iterable[str]) β The columns containing the features
target_column (str) β The column containing the target variable
importance_threshold (float, optional) β The minimum importance that is considered significant, by default 0
- Returns:
The result of the test, containing the feature importances
- Return type:
- giskard.testing.test_class_imbalance(dataset: SuiteInput | Dataset | None = None, target_column: SuiteInput | str | None = None, lower_threshold: SuiteInput | float | None = None, upper_threshold: SuiteInput | float | None = None) GiskardTestMethod [source]ΒΆ
Test for assessing the distribution of classes in classification problems.
- Parameters:
dataset (Dataset) β The dataset to test
target_column (str) β The column containing the target variable
lower_threshold (float) β The minimum allowed class proportion
upper_threshold (float) β The maximum allowed class proportion
- Returns:
The result of the test, containing the class proportions
- Return type: