Upload your ML model & data

How to upload your data and Machine Learning model to Giskard using Python


To upload the model you want to inspect, you need:
  • A model. For example, a scikit-learn, Tensorflow, HuggingFace, catboost, PyTorch, ... Python functions
  • A pandas dataframe composed of the examples you want to inspect. For example, it could be your test dataset or a dataset composed of some wrong predictions of your model
  • The Giskard's platform. To install it, check Installation & upgrade

Steps to upload your data & model

1. Install the Giskard library

In order to upload models and datasets to Giskard, you'll need to install the library giskard:
pip install giskard
In case of installation errors related to giskard library, it's sometimes a good idea to remove it with:
pip uninstall giskard
and re-installing again

2. Start ML Worker

ML worker is the component in Giskard that connect your Python environment to the Giskard server that you just installed. For more technical information, have a look at this page. To start MLworker, execute the following command line in the terminal of the machine where your model was created:
giskard worker start -h [GISKARD IP ADDRESS]
If ML Worker manages to connect to the Giskard instance, you should see the following message in the worker logs: "Connected to Giskard server."
  • If you work from your notebook, you will need to start Giskard as a daemon with:
giskard worker start -d -h [GISKARD IP ADDRESS]
  • If Giskard is installed locally, please only do: giskard worker start. That will establish a connection to the Giskard instance installed on localhost:40051.
  • If Giskard is not installed locally, please specify the IP address (and a port in case a custom port is used). For example, giskard worker start -h
For more information, see this page.

3. Create a new Giskard project or load an existing project

To create a new project or load an existing one, run the code below in your Python environment:
from giskard import GiskardClient
url = "http://localhost:19000" #if Giskard is installed locally (for installation, see:
token = "YOUR GENERATED TOKEN" #you can generate your API token in the Admin tab of the Giskard application (for installation, see:
client = GiskardClient(url, token)
project = client.create_project("project_key", "PROJECT_NAME", "DESCRIPTION") #Choose the arguments you want. But "project_key" should be unique and in lower case
#If your project is already created use project = client.get_project("existing_project_key")
If you want to use an existing project, use project=client.get_project("EXISTING_PROJECT_KEY")to load the existing project, then use:
  • upload_model to upload a new version of the model you want to inspect/test
  • upload_dataset to upload a new dataset that you want to apply to your existing model
For more details about the arguments of these functions, see our Github repo.

4. Upload a model and a dataset

Apply the upload_model_and_df to the project using the following arguments:
The model you want to predict. It could be any Python function that takes a Pandas dataframe as input and returns:
  • the probabilities for all the classification labels if model_type=classification
  • the prediction if model_type=regression
If you have preprocessing steps, wrap the whole prediction pipeline: all the preprocessing steps (categorical encoding, scaling, etc.) + ML predict_proba function. Click here for more information.
[pd.DataFrame], Iterable[Union[str, float, int]]
  • classification for classification model
  • regression for regression model
A Pandas dataframe that contains some data examples that might interest you to inspect (test set, train set, production data). Some important remarks:
  • df can contain more columns than the features of the model such as the actual ground truth variable, sample_id, metadata, etc.
  • df should be raw data that comes before all the preprocessing steps
Pandas dataframe
A dictionary of column names and their types (numeric, category or text) for all columns of df.
Dict[str, str]
The column name in df corresponding to the actual target variable (ground truth).
An optional list of the feature names of prediction_function. By default, feature_namesare all the keys from column_types except from target. Some important remarks:
  • Make sure that prediction_function(df[feature_names]) does not return an error message
  • Make sure these features have the same order as in your train set.
The classification labels of your prediction when prediction_task="classification". Some important remarks:
  • If classification_labels is a list of n elements, make sure prediction_function is also returning n probabilities
  • Make sure the labels have the same order as the output of prediction_function
Optional[List[str]] = None
The probability threshold in the case of a binary classification model. By default, it's equal to 0.5
Optional[float] = 0.5
The name of the model you uploaded
The name of the dataset you uploaded
It's better to upload prediction_function as a function that wraps the whole prediction pipeline: all the preprocessing steps (categorical encoding, etc.) + ML prediction. This is key for a robust and interpretable inspection stage! Click here for examples with and without pipelines.
Make sure that prediction_function(df[feature_names]) gets executed without an error. This is the only requirement to upload a model on Giskard!


!pip install giskard
!giskard worker start
from giskard import GiskardClient
client = GiskardClient(url, token)
#If you're creating your project for the first time
credit_scoring = client.create_project("credit_scoring", "Credit scoring project", "Predict the default probabilities of a credit demand")
#If your project is already created use
#project = client.get_project("credit_scoring")
'Is_default': 'category'
target = 'Is_default',
classification_labels=['Not default','Default']
Example notebooks:
  • You can download an example notebook here to execute it in your working environment
  • To get started with Giskard as fast as possible we've included a demo python notebook in the platform with all the requirements on http://localhost:19000/jupyter (accessible after the Installation & upgrade). Feel free to modify it to adapt it to your case!
Now you uploaded your model, let's Collect feedback of your ML model


If you encounter any issues, join our Discord on our #support channel. Our community will help!