🐝 Weights & Biases#
Giskard’s automated vulnerability detection in conjunction with W&B’s tracing tools creates the ideal combination for building and debugging ML apps from tabular to LLMs.
Why Weights & Biases?#
Weights and Biases, often referred to as wandb or even simply W&B, is an MLOps platform that helps AI developers streamline their ML workflow from end to end. With W&B, developers have access to:
Experiment Tracking: Wandb logs hyperparameters, metrics, and visuals for easy experiment comparison and change impact analysis.
Visualizations: Create interactive performance visuals for seamless collaboration and communication.
Collaboration: Wandb facilitates team collaboration by sharing experiments, insights, and results.
Model Versioning: Easily manage and version machine learning models for reproducibility.
Hyperparameter Tuning: Streamline hyperparameter search with wandb to find optimal model configurations.
Integration: Seamlessly incorporate Wandb into your ML workflow, integrating with popular frameworks like TensorFlow and PyTorch.
Furthermore, and in the context of LLMs, W&B introduced a new debugging tool “W&B Traces” designed to support ML practitioners working on prompt engineering for LLMs. It enables the visualization and drilling down into every component and activity throughout the trace of the LLM pipeline execution. In addition, it allows the review of past results, identification and debugging of errors, gathering insights about the LLM’s behavior, and sharing insights.
Tracing is invaluable, but how do we measure the quality of the outputs throughout the pipeline? could there be hidden vulnerabilities that our carefully-crafted prompts may have inadvertently failed to counter? Is there a way to detect such vulnerabilities automatically? Would it be possible to log these issues into W&B to complement the tracing? Well, in a nutshell, the answer is yes to all these questions. That’s precisely the capability that Giskard brings to the table.
Why integrating Giskard?#
Giskard is an open-source testing framework dedicated to ML models, covering any Python model, from tabular to LLMs.
Testing Machine Learning applications can be tedious: Where to start testing? Which tests to implement? What issues to cover? How to implement the tests?
Giskard offers several compelling reasons to use it in conjunction with W&B for your ML projects:
Automated Vulnerability Detection: Giskard’s scan feature ensures the identification of hidden vulnerabilities in ML models by generating a comprehensive report that can be logged into W&B.
Tabular and NLP models: wherein some of the most important vulnerabilities revolves around performance biases, data leakage, unrobustness, and more.
LLMs: wherein some of the most critical vulnerabilities are Prompt Injection (when LLMs are manipulated to behave as the attacker wishes), Sensitive Information Disclosure (when LLMs inadvertently leak confidential information), Hallucination (when LLMs generate inaccurate or inappropriate content), and more. In conjunction with the tracing, the scan report creates the ideal combination for building and debugging LLM apps.
Customizable Tests: Giskard generates tailored tests based on the detected vulnerabilities. You can further customize these tests by defining domain-specific data slicers and transformers.
Interpretability plots: For tabular models, Giskard logs interpretability plots generated by SHAP into W&B. These plots provide a detailed analysis on feature importance.
To use Giskard with Weights and Biases, you need to follow these steps:
Logging from Giskard to Weights and Biases#
In order to get the most out this integration, you would need to follow these three steps to diagnose your ML model:
wrap your dataset by following this guide.
wrap your ML model by following this guide.
scan your ML model for vulnerabilities by following this guide.
Once the above steps are done, you can know log the results into Weights and Biases by doing the following:
import giskard, wandb
# [...] wrap model and dataset with giskard
scan_results = giskard.scan(giskard_model, giskard_dataset) # works for tabular, NLP and LLMs
test_suite_results = scan_results.generate_test_suite().run() # works for tabular, NLP and LLMs
shap_results = giskard.explain_with_shap(giskard_model, giskard_dataset) # only works for tabular models
wandb.login(key="key to retrieve from https://wandb.ai/authorize")
run = wandb.init(project="my_project", name="my_run")
giskard_dataset.to_wandb(run) # log your dataset as a table
scan_results.to_wandb(run) # log scan results as an HTML report
test_suite_results.to_wandb(run) # log test suite results as a table
shap_results.to_wandb(run) # log shap results as plots