Slicing functions#
- giskard.slicing_function(_fn=None, row_level=True, name=None, tags: List[str] | None = None, cell_level=False)#
Decorator that registers a function as a slicing function and returns a SlicingFunction instance. It can be used for slicing datasets in a specific way during testing.
- Parameters:
_fn – function to decorate. No need to provide this argument, the decorator will automatically take as input the function to decorate.
name – Optional name to use for the function when registering it.
tags – Optional list of tags to use when registering the function.
row_level – Whether to apply the slicing function row-wise (default) or on the full dataframe. If row_level is True, the slicing function will receive a row (either a Series or DataFrame), and if False, it will receive the entire dataframe.
cell_level – Whether to apply the slicing function on the cell level. If True, the slicing function will be applied to individual cells instead of rows or the entire dataframe.
- Returns:
The wrapped function or a new instance of SlicingFunction.
- class giskard.ml_worker.testing.registry.slicing_function.SlicingFunction(func: Callable[[...], bool] | None, row_level=True, cell_level=False)#
A slicing function used to subset data.
- func#
The function used to slice the data.
- Type:
SlicingFunctionType
- row_level#
Whether the slicing function should operate on rows or columns.
- Type:
bool
- cell_level#
Whether the slicing function should operate at the cell level.
- Type:
bool
- params#
Additional parameters for the slicing function.
- Type:
Dict
- is_initialized#
Indicates if the slicing function has been initialized.
- Type:
bool
Initializes a new instance of the SlicingFunction class.
- Parameters:
func (SlicingFunctionType) – The function used to slice the data.
row_level (bool) – Whether the slicing function should operate on rows or the whole dataframe. Defaults to True.
- execute(data: Series | DataFrame)#
Slices the data using the slicing function.
- Parameters:
data (Union[pd.Series, pd.DataFrame]) – The data to slice.
- Returns:
The sliced data.
- Return type:
Union[pd.Series, pd.DataFrame]
- upload(client: GiskardClient, project_key: str | None = None) str #
Uploads the slicing function and its metadata to the Giskard server.
- Parameters:
client (GiskardClient) – The Giskard client instance used for communication with the server.
project_key (str, optional) – The project key where the slicing function will be uploaded. If None, the function will be uploaded to the global scope. Defaults to None.
- Returns:
The UUID of the uploaded slicing function.
- Return type:
str
- classmethod download(uuid: str, client: GiskardClient | None, project_key: str | None) Artifact #
Downloads the artifact from the Giskard server or retrieves it from the local cache.
- Parameters:
uuid (str) – The UUID of the artifact to download.
client (GiskardClient, optional) – The Giskard client instance used for communication with the server. If None, the artifact will be retrieved from the local cache if available. Defaults to None.
project_key (str, optional) – The project key where the artifact is located. If None, the artifact will be retrieved from the global scope. Defaults to None.
- Returns:
The downloaded artifact.
- Return type:
Artifact
- Raises:
AssertionError – If the artifact metadata cannot be retrieved.
AssertionError – If the artifact is not found in the cache and the Giskard client is None.
Textual slicing#
- giskard.ml_worker.testing.functions.slicing.short_comment_slicing_fn(max_words: SuiteInput | int | None = 5) SlicingFunction #
Filter the rows where the specified ‘column_name’ contains a short comment, defined as one with at most ‘max_words’.
- giskard.ml_worker.testing.functions.slicing.keyword_lookup_slicing_fn(keywords: SuiteInput | List[str] | None = None) SlicingFunction #
Filter the rows where the specified ‘column_name’ contains at least one of the specified ‘keywords’.
- giskard.ml_worker.testing.functions.slicing.positive_sentiment_analysis(column_name: SuiteInput | str | None = None, threshold: SuiteInput | float | None = 0.9) SlicingFunction #
Filter the rows where the specified ‘column_name’ has a positive sentiment, as determined by a pre-trained sentiment analysis model.
- giskard.ml_worker.testing.functions.slicing.offensive_sentiment_analysis(column_name: SuiteInput | str | None = None, threshold: SuiteInput | float | None = 0.9) SlicingFunction #
Filter the rows where the specified ‘column_name’ has a offensive sentiment, as determined by a pre-trained sentiment analysis model.
- giskard.ml_worker.testing.functions.slicing.irony_sentiment_analysis(column_name: SuiteInput | str | None = None, threshold: SuiteInput | float | None = 0.9) SlicingFunction #
Filter the rows where the specified ‘column_name’ has a ironic sentiment, as determined by a pre-trained sentiment analysis model.
- giskard.ml_worker.testing.functions.slicing.hate_sentiment_analysis(column_name: SuiteInput | str | None = None, threshold: SuiteInput | float | None = 0.9) SlicingFunction #
Filter the rows where the specified ‘column_name’ has a hateful sentiment, as determined by a pre-trained sentiment analysis model.
- giskard.ml_worker.testing.functions.slicing.emotion_sentiment_analysis(column_name: SuiteInput | str | None = None, emotion: SuiteInput | str | None = None, threshold: SuiteInput | float | None = 0.9) SlicingFunction #
Filter the rows where the specified ‘column_name’ has an emotion matching ‘emotion’, as determined by a pre-trained sentiment analysis model. Possible emotion are: ‘optimism’, ‘anger’, ‘sadness’, ‘joy’
Numerical slicing functions#
- giskard.ml_worker.testing.functions.slicing.outlier_filter(lower_bound: SuiteInput | float | None = None, upper_bound: SuiteInput | float | None = None) SlicingFunction #
Filter rows where the specified column values fall outside the specified range.