Advanced scan usage¶

It is possible to customize the configuration of the scan by passing specific configuration at runtime to the scan method.

The following examples show the different options available.

Limiting to a specific group of detectors¶

If you want to run only a specific detector (or a group of detectors), you can use the only argument. This argument accepts either a tag or a list of tags:

import giksard as gsk

report = gsk.scan(my_model, my_dataset, only="robustness")

or with multiple tags:

report = gsk.scan(my_model, my_dataset, only=["robustness", "performance"])

Limiting to a selection of model features¶

If your model has a great number of features and you want to limit the scan to a specific subset, you can use the features argument:

import giksard as gsk

report = gsk.scan(my_model, my_dataset, features=["feature_1", "feature_2"])

This will produce scan results only for the features feature_1 and feature_2.

Advanced detector configuration¶

If you want to customize the configuration of a specific detector, you can use the params argument, which accepts a dictionary where the key is the identifier of the detector and the value is a dictionary of config options that will be passed to the detector upon initialization:

import giksard as gsk

params = {
    "performance_bias": dict(threshold=0.04, metrics=["accuracy", "f1"]),
    "ethical_bias": dict(output_sensitivity=0.5),
}

report = gsk.scan(my_model, my_dataset, params=params)

You can check in the reference documentation of each detector which options are available for customization.

How to make the scan faster¶

If you are dealing with a big dataset, the scan may take a long time to analyze all the vulnerabilities. If this is the case, you may choose to limit the scan to a subset of the features and detectors that are the most relevant to your use case (see above for detailed instructions):

report = gsk.scan(my_model,
    my_dataset,
    only=["robustness", "performance"],
    features=["feature_1", "feature_2"],
)

Moreover, certain detectors do a full scan of the dataset, which can be very slow. If this is the case, we recommend use the following configuration which so that these detectors will only scan a random sample of your dataset:

params = {
    "performance_bias": dict(max_dataset_size=100_000),
    "overconfidence": dict(max_dataset_size=100_000),
    "underconfidence": dict(max_dataset_size=100_000),
}

report = gsk.scan(my_model, my_dataset, params=params)

This will limit the scan to 100,000 samples of your dataset. You can adjust this number to your needs.

Note: for classification models, we will make sure that the sample is balanced between the different classes via stratified sampling.