Open In Colab View Notebook on GitHub

W&B Example - TabularΒΆ

Detecting tabular ML models vulnerabilities in W&B with GiskardΒΆ

This example demonstrates how to efficiently scan two tabular ML models for hidden vulnerabilities using Giskard, log the results and interpret them within the W&B framework in just a few lines of code. We will use the following two tabular ML models:

Model

Description

Training data

model1

A LGBMClassifier model trained only for 5 epochs.

Titanic dataset

model2

A LGBMClassifier model trained for 100 epochs.

Titanic dataset

[ ]:
import wandb

from giskard import Model, Dataset, demo, explain_with_shap, scan

model1, df = demo.titanic(model="LGBMClassifier", max_iter=5)
model2, __ = demo.titanic(model="LGBMClassifier", max_iter=100)  # Datasets are identical.
models = {"titanic-model_lgbm_max_iter=5": model1, "titanic-model_lgbm_max_iter=100": model2}

wrapped_data = Dataset(df=df,
                       target="Survived",
                       cat_columns=['Pclass', 'Sex', "SibSp", "Parch", "Embarked"])

wandb.login(key="key to retrieve from https://wandb.ai/authorize")
for model_name, model in models.items():
    wrapped_model = Model(model=model.predict_proba,
                          model_type="classification",
                          feature_names=['PassengerId', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked'],
                          classification_labels=model.classes_)

    run = wandb.init(project="titanic_demo", name=model_name)

    # Log results to the new W&B run.
    wrapped_data.to_wandb()

    shap_explanation_result = explain_with_shap(wrapped_model, wrapped_data)
    shap_explanation_result.to_wandb()

    scan_results = scan(wrapped_model, wrapped_data)
    scan_results.to_wandb()

    test_suite = scan_results.generate_test_suite()
    test_suite.run().to_wandb()

    # Finish a current run.
    run.finish()

After logging the results, you can visualise them on the W&B User Interface by running wandb server start via http://localhost:8080. You will be able to visualise the following:

The datasetΒΆ

9706368adc66423bb58304e12b8ff808

The SHAP bar plots for categorical featuresΒΆ

282009df716c45b295f8eb5539fbd12d

The SHAP scatter plots for numerical featuresΒΆ

aabdd0b165c3455ead2d7f814d9dcb28

The SHAP global feature importance plotΒΆ

4868bef46cea49c6817e7b613afa5b31

The Giskard scan resultsΒΆ

b71a430f9f314fd79dcb85b9ca26c0da

The Giskard test-suite resultsΒΆ

d3dcf06ed6344a71b9c660ed556750e9