Open In Colab View Notebook on GitHub

MLFlow Example - Tabular#

Detecting tabular ML models vulnerabilities in MLflow with Giskard#

This example demonstrates how to efficiently scan two tabular ML models for hidden vulnerabilities using Giskard and interpret the results within MLflow through just a few lines of code. The two tabular ML models used are:

Model

Description

Training data

model1

A simple sklearn LogisticRegression model trained only for 5 epochs.

Titanic dataset

model2

A simple sklearn LogisticRegression model trained for 100 epochs.

Titanic dataset

[ ]:
import mlflow
import giskard

from giskard import demo
model1, df = demo.titanic(max_iter=5)
model2, df = demo.titanic(max_iter=100)

models = {"model1": model1, "model2": model2}

for model_name, model in models.items():
    with mlflow.start_run(run_name=model_name) as run:
        model_uri = mlflow.sklearn.log_model(model, model_name, pyfunc_predict_fn="predict_proba").model_uri
        mlflow.evaluate(model=model_uri, model_type="classifier", data=df, targets="Survived", evaluators="giskard", evaluator_config={"model_config":   {"classification_labels": ["no", "yes"]}})

After completing the previous steps, you can run mlflow ui from the directory where the mlruns folder is located, which will enable you to visualize the results. By accessing http://127.0.0.1:5000, you will be presented with the interface. There, you will find the two LLMs logged as separate runs for comparison and analysis. af6d368ee5e64de4b2762921e0344f7e

The giskard scan results: aa688eda84074bc8b026978ce53fb661

The metrics generated by the scan: ca031284f66c4233911d92361493c994

A scan summary: After each model evaluation, a scan-summary.json file is created, enabling a comparison of vulnerabilities and metrics for each model in the Artifact view. bb0a7218f88042cf8484648a2036fd72