Upload your ML model & data
How to upload your data and Machine Learning model to Giskard using Python

Prerequisites

To upload the model you want to inspect, you need:
  • A model. For example, a scikit-learn, Tensorflow, HuggingFace, catboost, PyTorch, ... Python functions
  • A pandas dataframe composed of the examples you want to inspect. For example, it could be your test dataset or a dataset composed of some wrong predictions of your model
  • The Giskard's platform. To install it, check Installation & upgrade

Steps to upload your data & model

1. Install Giskard library

In order to upload models and datasets to Giskard, you'll need to install the library giskard:
pip install giskard

2. Create a new project or load an existing project

To create a new project or load an existing one, run the code below in your Python environment:
from giskard.giskard_client import GiskardClient
url = "http://localhost:19000" #if Giskard is installed locally (for installation, see: https://docs.giskard.ai/start/guides/installation)
token = "YOUR GENERATED TOKEN" #you can generate your API token in the Admin tab of the Giskard application (for installation, see: https://docs.giskard.ai/start/guides/installation)
client = GiskardClient(url, token)
project = client.create_project("project_key", "PROJECT_NAME", "DESCRIPTION") #Choose the arguments you want. But "project_key" should be unique and in lower case
#If your project is already created use project = client.get_project("existing_project_key")
If you want to use an existing project, use project=client.get_project("EXISTING_PROJECT_KEY")to load the existing project, then use:
  • upload_model to upload a new version of the model you want to inspect/test
  • upload_dataset to upload a new dataset that you want to apply to your existing model
For more details about the arguments of these functions, see our Github repo.

3. Upload a model and a dataset

Apply the upload_model_and_df to the project using the following arguments:
Argument
Description
Type
prediction_function
The model you want to predict. It could be any Python function that takes a Pandas dataframe as input and returns:
  • the probabilities for all the classification labels if model_type=classification
  • the prediction if model_type=regression
If you have preprocessing steps, wrap the whole prediction pipeline: all the preprocessing steps (categorical encoding, scaling, etc.) + ML predict_proba function. Click here for more information.
Callable[
[pd.DataFrame], Iterable[Union[str, float, int]]
model_type
  • "classification" for classification model
  • "regression" for regression model
str
df
A Pandas dataframe that contains some data examples that might interest you to inspect (test set, train set, production data). Some important remarks:
  • df can contain more columns than the features of the model such as the actual ground truth variable, sample_id, metadata, etc.
  • df should be raw data that comes before all the preprocessing steps
Pandas dataframe
column_types
A dictionary of column names and their types (numeric, category or text) for all columns of df.
Dict[str, str]
target
The column name in df corresponding to the actual target variable (ground truth).
str
feature_names
An optional list of the feature names of prediction_function. By default, feature_namesare all the keys from column_types except from target. Some important remarks:
  • Make sure that prediction_function(df[feature_names]) does not return an error message
  • Make sure these features have the same order as in your train set.
Optional[[List[str]]
classification_labels
The classification labels of your prediction when prediction_task="classification". Some important remarks:
  • If classification_labels is a list of n elements, make sure prediction_function is also returning n probabilities
  • Make sure the labels have the same order as the output of prediction_function
Optional[List[str]] = None
classification_threshold
The probability threshold in the case of a binary classification model. By default, it's equal to 0.5
Optional[float] = 0.5
model_name
The name of the model you uploaded
Optional[str]
dataset_name
The name of the dataset you uploaded
Optional[str]
It's better to upload prediction_function as a function that wraps the whole prediction pipeline: all the preprocessing steps (categorical encoding, etc.) + ML prediction. This is key for a robust and interpretable inspection stage! Click here for examples with and without pipelines.
Make sure that prediction_function(df[feature_names]) gets executed without an error. This is the only requirement to upload a model on Giskard!

Examples

from giskard.giskard_client import GiskardClient
client = GiskardClient(url, token)
#If you're creating your project for the first time
credit_scoring = client.create_project("credit_scoring", "Credit scoring project", "Predict the default probabilities of a credit demand")
#If your project is already created use
#project = client.get_project("credit_scoring")
credit_scoring.upload_model_and_df(
prediction_function=clf.predict_proba,
model_type='classification',
df=test_data,
column_types={
'credit_id':'category',
'credit_amount':'numeric',
'credit_category':'category',
'credit_application':'text',
'Is_default': 'category'
},
target = 'Is_default',
feature_names=['credit_amount','credit_category','credit_application'],
classification_labels=['Not default','Default']
)
Example notebooks:
  • You can download an example notebook here to execute it in your working environment
  • To get started with Giskard as fast as possible we've included a demo python notebook in the platform with all the requirements on http://localhost:19000/jupyter (accessible after the Installation & upgrade). Feel free to modify it to adapt it to your case!
Now you uploaded your model, let's Evaluate your ML model

Troubleshooting

If you encounter any issues, join our Discord on our #support channel. Our community will help!
Export as PDF
Copy link
Edit on GitHub
Outline
Prerequisites
Steps to upload your data & model
1. Install Giskard library
2. Create a new project or load an existing project
3. Upload a model and a dataset
Examples
Troubleshooting​