π Wrap your ML model#
To scan, test and debug your model, you need to wrap it into a Giskard Model. Your model can use any ML library (HuggingFace
, PyTorch
, Tensorflow
, Sklearn
, etc.) and can be any Python function that respects the right signature. You can wrap your model in two different ways:
Wrap a prediction function that contains all your data pre-processing steps.
Wrap a model object in addition to a data pre-processing function.
Hint
Choose βWrap a model objectβ if your model is not serializable by cloudpickle
(e.g. TensorFlow models).
Prediction function is any Python function that takes input as raw pandas dataframe and returns the probabilities for each classification label.
Make sure that:
prediction_function
encapsulates all the data pre-processing steps (categorical encoding, numerical scaling, etc.).prediction_function(df[feature_names])
does not return an error message.
from giskard import demo, Model
demo_data_processing_function, demo_sklearn_model = demo.titanic_pipeline()
def prediction_function(df):
# The pre-processor can be a pipeline of one-hot encoding, imputer, scaler, etc.
preprocessed_df = demo_data_processing_function(df)
return demo_sklearn_model.predict_proba(preprocessed_df)
wrapped_model = Model(
model=prediction_function,
model_type="classification",
classification_labels=demo_sklearn_model.classes_, # Their order MUST be identical to the prediction_function's output order
feature_names=['PassengerId', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked'], # Default: all columns of your dataset
# name="titanic_model", # Optional
# classification_threshold=0.5, # Default: 0.5
)
Mandatory parameters
model
: A prediction function that takes apandas.DataFrame
as input and returns an array (\(n\times m\)) of probabilities corresponding to \(n\) data entries (rows ofpandas.DataFrame
) and \(m\)classification_labels
. In the case of binary classification, an array
(\(n\times 1\)) of probabilities is also accepted.model_type
: The type of model, eitherregression
,classification
ortext_generation
.classification_labels
: The list of unique categories contained in your dataset target variable. Ifclassification_labels
is a list of \(m\) elements, make sure that:prediction_function
is returning a (\(n\times m\)) array of probabilities.classification_labels
have the same order as the output ofprediction_function
.
Optional parameters
name
: Name of the wrapped model.feature_names
: An optional list of the feature names. By default,feature_names
are all the columns in your dataset. Make sure these features are in the same order as they are in your training dataset.classification_threshold
: Model threshold for binary classification problems.
Prediction function is any Python function that takes the input as raw pandas dataframe and returns the predictions for your regression task.
Make sure that:
prediction_function
encapsulates all the data pre-processing steps (categorical encoding, numerical scaling, etc.).prediction_function(df[feature_names])
does not return an error message.
import numpy as np
from giskard import demo, Model
demo_data_processing_function, reg = demo.linear_pipeline()
def prediction_function(df):
preprocessed_df = demo_data_processing_function(df)
return np.squeeze(reg.predict(preprocessed_df))
wrapped_model = Model(
model=prediction_function,
model_type="regression",
feature_names=['x'], # Default: all columns of your dataset
# name="linear_model", # Optional
)
Mandatory parameters
model
: A prediction function that takespandas.DataFrame
as input and returns an array \(n\) of predictions corresponding to \(n\) data entries (rows ofpandas.DataFrame
).model_type
: The type of model, eitherregression
,classification
ortext_generation
.
Optional parameters
name
: Name of the wrapped model.feature_names
: An optional list of the feature names. By default,feature_names
are all the columns in your dataset. Make sure these features are in the same order as they are in your training dataset.
Prediction function is any Python function that takes the input as raw pandas dataframe and returns the predictions for your text generation task.
Make sure that:
prediction_function
encapsulates all the data pre-processing steps (categorical encoding, numerical scaling, etc.).prediction_function(df[feature_names])
does not return an error message.
from langchain.chains import LLMChain
from langchain.llms.fake import FakeListLLM
from langchain.prompts import PromptTemplate
from giskard import Model
responses = [
"\n\nHueFoots.", "\n\nEcoDrive Motors.",
"\n\nRainbow Socks.", "\n\nNoOil Motors."]
llm = FakeListLLM(responses=responses)
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
chain = LLMChain(llm=llm, prompt=prompt)
def prediction_function(df):
return [chain.predict(**data) for data in df.to_dict('records')]
wrapped_model = Model(prediction_function, model_type='text_generation')
Mandatory parameters
model
: A prediction function that takespandas.DataFrame
as input and returns an array \(n\) of predictions corresponding to \(n\) data entries (rows ofpandas.DataFrame
).model_type
: The type of model, eitherregression
,classification
ortext_generation
.
Optional parameters
name
: Name of the wrapped model.feature_names
: An optional list of the feature names. By default,feature_names
are all the columns in your dataset. Make sure these features are in the same order as they are in your training dataset.
Providing the model object to Model
allows us to automatically infer the ML library of your model
object and provide a suitable serialization method (provided by save_model
and load_model
methods).
This requires:
Mandatory: Overriding the
model_predict
method which should take the input as raw pandas dataframe and return the probabilities for each classification labels (classification) or predictions (regression or text_generation).Optional: Our pre-defined serialization and prediction methods cover the
sklearn
,catboost
,pytorch
,tensorflow
,huggingface
andlangchain
libraries. If none of these libraries are detected,cloudpickle
is used as the default for serialization. If this fails, we will ask you to also override thesave_model
andload_model
methods where you provide your own serialization of themodel
object.
from giskard import demo, Model
demo_data_processing_function, demo_sklearn_model = demo.titanic_pipeline()
class MyCustomModel(Model):
def model_predict(self, df):
preprocessed_df = demo_data_processing_function(df)
return self.model.predict_proba(preprocessed_df)
wrapped_model = MyCustomModel(
model=demo_sklearn_model,
model_type="classification",
classification_labels=demo_sklearn_model.classes_, # Their order MUST be identical to the prediction_function's output order
feature_names=['PassengerId', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare',
'Embarked', 'Survived'], # Default: all columns of your dataset
# name="titanic_model", # Optional
# classification_threshold=0.5, # Default: 0.5
# model_postprocessing_function=None, # Optional
# **kwargs # Additional model-specific arguments
)
Mandatory parameters
model
: Could be any model fromsklearn
,catboost
,pytorch
,tensorflow
,huggingface
orlangchain
(check the tutorials). If none of these libraries apply to you, we try to serialize your model withcloudpickle
. If that also does not work, we ask you to provide us with your own serialization method.model_type
: The type of the model, eitherregression
,classification
ortext_generation
.classification_labels
: The list of unique categories contained in your dataset target variable. Ifclassification_labels
is a list of \(m\) elements, make sure that:prediction_function
is returning a (\(n\times m\)) array of probabilities.classification_labels
have the same order as the output ofprediction_function
.
Optional parameters
name
: Name of the wrapped model.feature_names
: An optional list of the feature names. By default,feature_names
are all the columns in your dataset. Make sure these features are in the same order as they are in your training dataset.classification_threshold
: Model threshold for binary classification problems.data_preprocessing_function
: A function that takes apandas.DataFrame
as raw input, applies pre-processing and returns any object that could be directly fed tomodel
.model_postprocessing_function
: A function that takes amodel
output as input, applies post-processing and returns an object of the same type and shape as themodel
output.**kwargs
: Additional model-specific arguments (See Models).
import numpy as np
from giskard import demo, Model
demo_data_processing_function, reg = demo.linear_pipeline()
class MyCustomModel(Model):
def model_predict(self, df):
preprocessed_df = demo_data_processing_function(df)
return np.squeeze(self.model.predict(preprocessed_df))
wrapped_model = MyCustomModel(
model=reg,
model_type="regression",
feature_names=['x'], # Default: all columns of your dataset
# name="my_regression_model", # Optional
# model_postprocessing_function=None, # Optional
# **kwargs # Additional model-specific arguments
)
Mandatory parameters
model
: Could be any model fromsklearn
,catboost
,pytorch
,tensorflow
,huggingface
orlangchain
(check the tutorials). If none of these libraries apply to you, we try to serialize your model withcloudpickle
. If that also does not work, we ask you to provide us with your own serialization method.model_type
: The type of the model, eitherregression
,classification
ortext_generation
.
Optional parameters
name
: Name of the wrapped model.feature_names
: An optional list of the feature names. By default,feature_names
are all the columns in your dataset. Make sure these features are in the same order as your training dataset.data_preprocessing_function
: A function that takes apandas.DataFrame
as raw input, applies pre-processing and returns any object that could be directly fed tomodel
.model_postprocessing_function
: A function that takes amodel
output as input, applies post-processing and returns an object of the same type and shape as themodel
output.**kwargs
: Additional model-specific arguments (See Models).
from langchain.chains import LLMChain
from langchain.llms.fake import FakeListLLM
from langchain.prompts import PromptTemplate
from giskard import Model
responses = [
"\n\nHueFoots.", "\n\nEcoDrive Motors.",
"\n\nRainbow Socks.", "\n\nNoOil Motors."]
llm = FakeListLLM(responses=responses)
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
chain = LLMChain(llm=llm, prompt=prompt)
class MyCustomModel(Model):
def model_predict(self, df):
return [self.model.predict(**data) for data in df.to_dict('records')]
wrapped_model = MyCustomModel(chain, model_type='text_generation')
Mandatory parameters
model
: Could be any model fromsklearn
,catboost
,pytorch
,tensorflow
,huggingface
orlangchain
(check the tutorials). If none of these libraries apply to you, we try to serialize your model withcloudpickle
. If that also does not work, we ask you to provide us with your own serialization method.model_type
: The type of the model, eitherregression
,classification
ortext_generation
.
Optional parameters
name
: Name of the wrapped model.feature_names
: An optional list of the feature names. By default,feature_names
are all the columns in your dataset. Make sure these features are in the same order as your training dataset.data_preprocessing_function
: A function that takes apandas.DataFrame
as raw input, applies pre-processing and returns any object that could be directly fed tomodel
.model_postprocessing_function
: A function that takes amodel
output as input, applies post-processing and returns an object of the same type and shape as themodel
output.**kwargs
: Additional model-specific arguments (See Models).
Model-specific tutorials#
To check some examples of model wrapping, have a look at our tutorial section. We present there some notebooks based on:
ML libraries: HuggingFace, Langchain, API REST, PyTorch, Scikit-learn, LightGBM, Tensorflow
ML task: Classification, Regression and Text generation
Data types: Tabular, Text and Text generataion
Upload your model to the Giskard server#
Uploading the model to the Giskard server enables you to:
Compare your model with others using a test suite.
Gather feedback from your colleagues regarding your model.
Debug your model effectively in case of test failures.
Develop new tests that incorporate additional domain knowledge.
To upload your model to the Giskard server, go to Upload an object to the Giskard server.