Upload your ML model & data
How to upload your data and Machine Learning model to Giskard using Python
To upload the model you want to inspect, you need:
- A model. For example, a scikit-learn, Tensorflow, HuggingFace, catboost, PyTorch, ... Python functions
- A pandas dataframe composed of the examples you want to inspect. For example, it could be your test dataset or a dataset composed of some wrong predictions of your model
In order to upload models and datasets to Giskard, you'll need to install the library giskard:
pip install giskard
In case of installation errors related to
giskard
library, it's sometimes a good idea to remove it with:pip uninstall giskard
and re-installing again
giskard worker start -h [GISKARD IP ADDRESS]
If ML Worker manages to connect to the Giskard instance, you should see the following message in the worker logs: "Connected to Giskard server."
- If you work from your notebook, you will need to start Giskard as a daemon with:
giskard worker start -d -h [GISKARD IP ADDRESS]
- If Giskard is installed locally, please only do:
giskard worker start.
That will establish a connection to the Giskard instance installed on localhost:40051. - If Giskard is not installed locally, please specify the IP address (and a port in case a custom port is used). For example,
giskard worker start -h 192.158.1.38
To create a new project or load an existing one, run the code below in your Python environment:
from giskard import GiskardClient
url = "http://localhost:19000" #if Giskard is installed locally (for installation, see: https://docs.giskard.ai/start/guides/installation)
token = "YOUR GENERATED TOKEN" #you can generate your API token in the Admin tab of the Giskard application (for installation, see: https://docs.giskard.ai/start/guides/installation)
client = GiskardClient(url, token)
project = client.create_project("project_key", "PROJECT_NAME", "DESCRIPTION") #Choose the arguments you want. But "project_key" should be unique and in lower case
#If your project is already created use project = client.get_project("existing_project_key")
If you want to use an existing project, use
project=client.get_project("EXISTING_PROJECT_KEY")
to load the existing project, then use:upload_model
to upload a new version of the model you want to inspect/testupload_dataset
to upload a new dataset that you want to apply to your existing model
Apply the upload_model_and_df to the project using the following arguments:
Argument | Description | Type |
---|---|---|
prediction_function | The model you want to predict. It could be any Python function that takes a Pandas dataframe as input and returns:
If you have preprocessing steps, wrap the whole prediction pipeline: all the preprocessing steps (categorical encoding, scaling, etc.) + ML predict_proba function. Click here for more information. | Callable[ [pd.DataFrame], Iterable[Union[str, float, int]] |
model_type |
| str |
df | A Pandas dataframe that contains some data examples that might interest you to inspect (test set, train set, production data). Some important remarks:
| Pandas dataframe |
column_types | A dictionary of column names and their types ( numeric , category or text ) for all columns of df . | Dict[str, str] |
target | The column name in df corresponding to the actual target variable (ground truth). | Optional[str] |
feature_names | An optional list of the feature names of prediction_function. By default, feature_names are all the keys from column_types except from target . Some important remarks:
| Optional[[List[str]] |
classification_labels | The classification labels of your prediction when prediction_task ="classification". Some important remarks:
| Optional[List[str]] = None |
classification_threshold | The probability threshold in the case of a binary classification model. By default, it's equal to 0.5 | Optional[float] = 0.5 |
model_name | The name of the model you uploaded | Optional[str] |
dataset_name | The name of the dataset you uploaded | Optional[str] |
It's better to upload
prediction_function
as a function that wraps the whole prediction pipeline: all the preprocessing steps (categorical encoding, etc.) + ML prediction. This is key for a robust and interpretable inspection stage! Click here for examples with and without pipelines.Make sure that
prediction_function
(df
[feature_names]) gets executed without an error. This is the only requirement to upload a model on Giskard!!pip install giskard
!giskard worker start
from giskard import GiskardClient
client = GiskardClient(url, token)
#If you're creating your project for the first time
credit_scoring = client.create_project("credit_scoring", "Credit scoring project", "Predict the default probabilities of a credit demand")
#If your project is already created use
#project = client.get_project("credit_scoring")
credit_scoring.upload_model_and_df(
prediction_function=clf.predict_proba,
model_type='classification',
df=test_data,
column_types={
'credit_id':'category',
'credit_amount':'numeric',
'credit_category':'category',
'credit_application':'text',
'Is_default': 'category'
},
target = 'Is_default',
feature_names=['credit_amount','credit_category','credit_application'],
classification_labels=['Not default','Default']
)
Example notebooks:
- To get started with Giskard as fast as possible we've included a demo python notebook in the platform with all the requirements on http://localhost:19000/jupyter (accessible after the Installation & upgrade). Feel free to modify it to adapt it to your case!
Last modified 14d ago