Open In Colab View Notebook on GitHub

MLFlow Example - Tabular#

Detecting tabular ML models vulnerabilities in MLflow with Giskard#

This example demonstrates how to efficiently scan two tabular ML models for hidden vulnerabilities using Giskard and interpret the results within MLflow through just a few lines of code. The two tabular ML models used are:

Model

Description

Training data

model1

A simple sklearn LogisticRegression model trained only for 5 epochs.

Titanic dataset

model2

A simple sklearn LogisticRegression model trained for 100 epochs.

Titanic dataset

[ ]:
import mlflow

from giskard import demo
model1, df = demo.titanic(max_iter=5)
model2, df = demo.titanic(max_iter=100)

models = {"model1": model1, "model2": model2}

for model_name, model in models.items():
    with mlflow.start_run(run_name=model_name) as run:
        model_uri = mlflow.sklearn.log_model(model, model_name, pyfunc_predict_fn="predict_proba").model_uri
        mlflow.evaluate(model=model_uri, model_type="classifier", data=df, targets="Survived", evaluators="giskard", evaluator_config={"model_config":   {"classification_labels": ["no", "yes"]}})

After completing the previous steps, you can run mlflow ui from the directory where the mlruns folder is located, which will enable you to visualize the results. By accessing http://127.0.0.1:5000, you will be presented with the interface. There, you will find the two LLMs logged as separate runs for comparison and analysis. 1758cd8d977c4d34bbfc2d4606317637

The giskard scan results: 918bc174cb2843c28e7ce5715267df6b

The metrics generated by the scan: 7d0a4e0eba914ad393a88557cc4a0dad

A scan summary: After each model evaluation, a scan-summary.json file is created, enabling a comparison of vulnerabilities and metrics for each model in the Artifact view. a9be7e8a5a1b484f8b606c962592c164