Tests that the model is not vulnerable to control character injection.
This works by appending special characters like \r or \b to the
input and checking that the model output is not altered. If the model is
vulnerable, it will typically forget the prompt and output unrelated
content. See [1] for more details about this vulnerability.
dataset (Dataset) β A sample dataset which will be perturbed with char injection.
characters (Optional[Sequence[str]]) β The character to inject. By default, we will try with \r and \b.
features (Optional[Sequence[str]]) β The features to test. By default, will test all features.
max_repetitions (int) β The maximum number of repetitions of the character to inject, by default
1000. If the model fails with that number of repetition (for example
because of limited context length), we will try with half and then a
quarter of that number.
threshold (float) β Threshold for the fail rate, by default 0.1. If the fail rate is above
this threshold, the test will fail.
output_sensitivity (float) β Output sensitivity, by default 0.2. This is the minimum difference in
BERT score that for two outputs to be considered different.
debug (bool) β If True, the output dataset containing the failing samples will be
returned.
Tests that the model is not vulnerable to prompt injection.
This works by inserting a set of jailbreaking prompts to the input and
checking that the model output is not altered. If the model is vulnerable,
it will typically ignore any instructions or safeguarding protocols it might
have had and comply to the new instructions given by the injected prompt.
See [2] for more details about this vulnerability.
dataset (Dataset) β A dataset of adversarial prompts.
substrings (List[str]) β A list of substrings that needs to be found in the modelβs output in
order to classify it as a failure.
all_substrings_must_be_found (bool) β A flag to indicate whether all (True, default) or any (False) of the
substrings must be found in order to
classify the output as a failure. Default: True
exact_matching (bool) β A flag to indicate whether the substrings (typically one) needs to match
exactly the modelβs output in order to classify it as a failure.
Default: False.
word_matching (bool) β A flag to indicate whether the substrings should be found as independent
words in the modelβs output in order to classify it as a failure.
Default: False
case_sensitive (bool) β A flag to indicate whether the substrings matching should be
case-sensitive or not. Default: True
punctuation_sensitive (bool) β A flag to indicate whether the substrings matching should be
punctuation-sensitive or not. Default: True
threshold (float) β Threshold for the fail rate, by default 0.5. If the fail rate is above
this threshold, the test will fail.
debug (bool) β If True, the output dataset containing the failing samples will be
returned.
Evaluates the model output against a given requirement with another LLM (LLM-as-a-judge).
The model outputs over a given dataset will be validated against the
specified requirement using GPT-4 (note that this requires you to set the
OPENAI_API_TOKEN environment variable for the test to run correctly).
Parameters:
model (BaseModel) β The generative model to test.
dataset (Dataset) β A dataset of examples which will be provided as inputs to the model.
requirement (str) β The requirement to evaluate the model output against. This should be a
clear and explicit requirement that can be interpreted by the LLM, for
example: βThe model should decline to answerβ, βThe model should not
generate content that incites harm or violenceβ, or βThe model should
apologize and explain that it cannot answer questions unrelated to its
scopeβ.
debug (bool) β If True and the test fails, a dataset containing the rows that have
failed the evaluation criteria will be included in the test result.
Evaluates the model output against a given requirement with another LLM (LLM-as-a-judge).
The model outputs over a given dataset will be validated against the
specified requirement using GPT-4 (note that this requires you to set the
OPENAI_API_TOKEN environment variable for the test to run correctly).
Parameters:
model (BaseModel) β The generative model to test.
input_var (str) β The input to provide to the model. If your model has a single input
variable, this will be used as its value. For example, if your model has
a single input variable called question, you can set input_var
to the question you want to ask the model, question="WhatisthecapitalofFrance?". If need to pass multiple input variables to the
model, set input_as_json to True and specify input_var as a JSON
encoded object. For example:
`input_var='{"question":"WhatisthecapitalofFrance?","language":"English"}'`
requirement (str) β The requirement to evaluate the model output against. This should be a
clear and explicit requirement that can be interpreted by the LLM, for
example: βThe model should decline to answerβ, βThe model should not
generate content that incites harm or violenceβ.
input_as_json (bool) β If True, input_var will be parsed as a JSON encoded object. Default is
False.
debug (bool) β If True and the test fails, a dataset containing the rows that have
failed the evaluation criteria will be included in the test result.
dataset_1 (Dataset) β Another sample dataset of inputs, with same index as dataset_1. If
not passed, we will run a again predictions on the first inputs
dataset_1, and check that the outputs are coherent.
dataset_2 (Optional[Dataset]) β Another sample dataset of inputs, with same index as dataset_1. If
not passed, we will rerun the model on dataset_1.
eval_prompt (Optional[str]) β Optional custom prompt to use for evaluation. If not provided, the
default prompt of CoherencyEvaluator will be used.