Checks
Ready-to-use validation checks for common testing scenarios, including function-based checks, string matching, comparisons, policy evaluation, check composition, JSON validation, and LLM-powered semantic validation.
Function-based Checks
Section titled “Function-based Checks”FnCheck
Section titled “FnCheck”Create a check from a callable function. Perfect for quick prototyping and simple validation logic. Not reliably serializable — intended for programmatic/test use only.
Module: giskard.checks.builtin.fn
fn Callable Required Function taking a Trace, returning bool or CheckResult.
name str | None Default: None Optional check name.
description str | None Default: None Optional description.
success_message str | None Default: None Message when check passes.
failure_message str | None Default: None Message when check fails.
details dict[str, Any] Default: {} Additional details to include in result.
from giskard.checks import FnCheck
# Simple boolean checkcheck = FnCheck( fn=lambda trace: trace.last.outputs is not None, name="has_output", success_message="Output was provided", failure_message="No output found",)
# Async checkasync def validate_response(trace): is_valid = await external_validator(trace.last.outputs) return is_valid
check = FnCheck(fn=validate_response, name="async_validation")String Matching
Section titled “String Matching”StringMatching
Section titled “StringMatching”Check that validates string patterns (substring matching) in trace values.
Module: giskard.checks.builtin.text_matching
keyword str | None Substring to search for (or use keyword_key to extract from trace).
keyword_key str | None JSONPath to extract keyword from trace.
text str | None JSONPath-resolved text to search in. If unset, falls back to text_key.
text_key str Default: "trace.last.outputs" JSONPath to extract text to search in.
normalization_form Literal["NFC", "NFD", "NFKC", "NFKD"] | None Default: "NFKC" Unicode normalization applied before matching.
case_sensitive bool Default: True Whether matching is case-sensitive.
from giskard.checks import StringMatching
check = StringMatching(keyword="success", text_key="trace.last.outputs")
# Case-insensitivecheck = StringMatching( keyword="error", text_key="trace.last.outputs", case_sensitive=False)RegexMatching
Section titled “RegexMatching”Check with regex pattern matching.
Module: giskard.checks.builtin.text_matching
pattern str | None Regular expression pattern.
pattern_key str | None JSONPath to extract the regex pattern from the trace. Provide exactly one of
pattern or pattern_key.
text str | None JSONPath-resolved text to match against (alternative to text_key).
text_key str Default: "trace.last.outputs" JSONPath to extract text to match against.
match_timeout_seconds float Default: 1.0 Upper bound on how long regex matching may take before the check errors.
from giskard.checks import RegexMatching
check = RegexMatching( pattern=r"\d{3}-\d{3}-\d{4}", text_key="trace.last.outputs.phone",)Comparison Checks
Section titled “Comparison Checks”Validate numeric and comparable values against expected thresholds.
Module: giskard.checks.builtin.comparison
All comparison checks share these parameters:
expected_value Any | None Static expected value.
expected_value_key JSONPathStr | NotProvided JSONPath to extract expected value from trace.
key str Required JSONPath to extract the actual value from the trace.
normalization_form str | None Default: "NFKC" Unicode normalization: "NFC", "NFD", "NFKC", "NFKD".
Provide exactly one of expected_value or expected_value_key.
Equals
Section titled “Equals”Check that extracted values equal an expected value.
from giskard.checks import Equals
check = Equals(expected_value=42, key="trace.last.outputs.count")check = Equals(expected_value="success", key="trace.last.outputs.status")
# Compare against another trace valuecheck = Equals( expected_value_key="trace.interactions[0].outputs.baseline", key="trace.last.outputs.result",)NotEquals
Section titled “NotEquals”Check that extracted values do not equal an expected value.
from giskard.checks import NotEquals
check = NotEquals(expected_value="error", key="trace.last.outputs.status")GreaterThan / GreaterEquals
Section titled “GreaterThan / GreaterEquals”from giskard.checks import GreaterThan, GreaterEquals
check = GreaterThan( expected_value=0.8, key="trace.last.metadata.confidence_score")check = GreaterEquals(expected_value=100, key="trace.last.outputs.user_count")LesserThan / LesserThanEquals
Section titled “LesserThan / LesserThanEquals”from giskard.checks import LesserThan, LesserThanEquals
check = LesserThan(expected_value=500, key="trace.last.metadata.latency_ms")check = LesserThanEquals( expected_value=1000, key="trace.last.metadata.token_count")JSON Validation
Section titled “JSON Validation”JsonValid
Section titled “JsonValid”Check that a trace value is valid JSON. Accepts a JSON string or an already-parsed JSON-compatible value (dict, list, string, number, boolean, or None). Optionally validates against a JSON Schema.
Module: giskard.checks.builtin.json_valid
key str Default: "trace.last.outputs" JSONPath expression to extract the value to validate.
expected_schema dict[str, Any] | None Default: None Optional JSON Schema for the parsed value. Serialized as schema in JSON.
from giskard.checks import JsonValid
check = JsonValid(key="trace.last.outputs")
check = JsonValid( key="trace.last.outputs", expected_schema={ "type": "object", "required": ["answer"], "properties": {"answer": {"type": "string"}}, },)Policy Checks
Section titled “Policy Checks”RegoPolicy
Section titled “RegoPolicy”Evaluate an inline Rego policy against trace data using Regorus, an OPA-compatible Rego engine. Extracts the value passed as input from the trace via JSONPath, merges optional static data, and evaluates a boolean rule.
Module: giskard.checks.builtin.rego_policy
policy str Required Inline Rego source loaded into the engine.
rule str Required Fully qualified boolean rule path (e.g. data.giskard.allow). Must evaluate
to a boolean, be undefined (fail), or error on other types.
key str Default: "trace.last.outputs" JSONPath into the trace for the JSON value exposed to the policy as input.
data dict[str, Any] Default: {} Static data document merged into the policy engine via engine.add_data
(separate from input).
from giskard.checks import RegoPolicy
check = RegoPolicy( policy="""package giskard
default allow = false
allow if { input.role == "admin"}""", rule="data.giskard.allow",)Check Composition
Section titled “Check Composition”Combine built-in or custom checks with logical operators. All composition checks are in giskard.checks.builtin.composition.
Passes only when all inner checks pass. Short-circuits on the first failure or error.
Ordered list of checks to evaluate. All must pass.
from giskard.checks import AllOf, LesserThan, Equals
check = AllOf( checks=[ LesserThan(expected_value=10, key="trace.last.outputs"), Equals(expected_value=5, key="trace.last.outputs"), ])Passes when at least one inner check passes. Short-circuits on the first pass.
Ordered list of checks to evaluate. At least one must pass.
from giskard.checks import AnyOf, StringMatching
check = AnyOf( checks=[ StringMatching(keyword="yes", key="trace.last.outputs"), StringMatching(keyword="approved", key="trace.last.outputs"), ])Inverts the result of an inner check. Pass becomes fail and fail becomes pass. Error and skip results pass through unchanged.
The inner check whose result will be inverted.
from giskard.checks import Not, StringMatching
check = Not( check=StringMatching(keyword="forbidden", key="trace.last.outputs"))LLM-based Checks
Section titled “LLM-based Checks”Validation checks powered by Large Language Models for semantic understanding.
BaseLLMCheck
Section titled “BaseLLMCheck”Abstract base class for creating custom LLM-powered checks. Handles LLM interaction, prompt rendering, and result parsing — subclasses only need to define the evaluation prompt.
Module: giskard.checks.judges.base
BaseLLMCheck generator BaseGenerator Default: get_default_generator() LLM generator for evaluation. Falls back to the global default if not provided.
name str | None Default: None Optional check name.
description str | None Default: None Optional description.
.get_prompt() → str | Message | MessageTemplate | TemplateReference Returns the prompt to send to the LLM. Subclasses must implement this method.
.get_inputs() → dict[str, Any] Provides template variables for prompt rendering. Override to customize available variables. Default: {"trace": trace}.
.run() → CheckResult Execute the LLM-based check (inherited, usually doesn’t need overriding).
from giskard.checks.judges.base import BaseLLMCheck
@BaseLLMCheck.register("custom_llm_check")class CustomLLMCheck(BaseLLMCheck): custom_instruction: str
def get_prompt(self): return f""" Evaluate based on: {self.custom_instruction}
Input: {{{{ trace.last.inputs }}}} Output: {{{{ trace.last.outputs }}}}
Return passed=true if criteria are met, passed=false otherwise. """
check = CustomLLMCheck( custom_instruction="Response must be concise and helpful")LLMCheckResult
Section titled “LLMCheckResult”Module: giskard.checks.judges.base
Default result model for LLM-based checks. This is the structured output format expected from the LLM.
LLMCheckResult passed bool Whether the check passed.
reason str | None Optional explanation for the result.
Groundedness
Section titled “Groundedness”Validates that answers are grounded in provided context documents. Essential for RAG systems to ensure responses don't hallucinate information.
Module: giskard.checks.judges.groundedness
answer str | None The answer text to evaluate (static).
answer_key str Default: "trace.last.outputs" JSONPath to extract answer from trace.
context str | list[str] | None Context document(s) that should support the answer (static).
context_key str Default: "trace.last.metadata.context" JSONPath to extract context from trace.
generator BaseGenerator Default: get_default_generator() LLM generator for evaluation.
from giskard.checks import Groundedness
# Static valuescheck = Groundedness( answer="The Eiffel Tower is in Paris.", context=[ "Paris is the capital of France.", "The Eiffel Tower is a famous landmark.", ],)
# Extract from tracecheck = Groundedness( answer_key="trace.last.outputs.answer", context_key="trace.last.metadata.retrieved_docs",)AnswerRelevance
Section titled “AnswerRelevance”LLM-based check that evaluates whether the model's answer is relevant to the question. Uses conversation history for context but scores only the current turn.
Module: giskard.checks.judges.answer_relevance
question str | None Default: None Question to evaluate against. Takes priority over question_key when set.
question_key str Default: "trace.last.inputs" JSONPath to extract the question from the trace.
answer str | None Default: None Answer to evaluate. Takes priority over answer_key when set.
answer_key str Default: "trace.last.outputs" JSONPath to extract the answer from the trace.
context str | None Default: None Optional domain context describing the chatbot’s purpose (not extracted from the trace).
generator BaseGenerator Default: get_default_generator() LLM generator for evaluation (falls back to default).
from giskard.checks import AnswerRelevance, Scenario
scenario = ( Scenario(name="rag_relevance_multi_turn") .interact(inputs="What is the best language?", outputs="Python") .interact(inputs="What's Python?", outputs="A snake.") .check(AnswerRelevance()))Toxicity
Section titled “Toxicity”LLM-based check that detects toxic, harmful, or offensive content across configurable categories.
Module: giskard.checks.judges.toxicity
output str | None Default: None Text to evaluate. If omitted, extracted from the trace using output_key.
output_key str Default: "trace.last.outputs" JSONPath to extract the output from the trace.
categories list[str] Default: ["hate_speech", "harassment", "threats", "self_harm", "sexual_content", "violence"] Toxicity categories to evaluate. Restrict the list to focus the judge.
generator BaseGenerator Default: get_default_generator() LLM generator for evaluation (falls back to default).
from giskard.checks import Toxicity, Scenario
scenario = ( Scenario(name="safety_check") .interact(inputs="Tell me a joke", outputs="Here is a clean joke: ...") .check(Toxicity()))
check = Toxicity( output="This is a safe response.", categories=["hate_speech", "harassment"],)Conformity
Section titled “Conformity”Validates that a trace conforms to a given rule. The rule is plain text passed to the bundled prompt as-is (not evaluated as its own Jinja2 template). The full trace is supplied to the model for judgment.
Module: giskard.checks.judges.conformity
rule str Required The rule statement to evaluate against the trace (literal text).
generator BaseGenerator Default: get_default_generator() LLM generator for evaluation (falls back to default).
from giskard.checks import Conformity
check = Conformity(rule="The response must be professional and polite")check = Conformity(rule="The last response should be polite.")LLMJudge
Section titled “LLMJudge”General-purpose LLM-based validation with custom prompts. The most flexible LLM check — use when specialized checks (Groundedness, Conformity) don't fit your needs.
Module: giskard.checks.judges.judge
prompt str | None Inline prompt content with Jinja2 templating support.
prompt_path str | None Path to a template file (e.g. "checks::my_template.j2").
generator BaseGenerator Default: get_default_generator() LLM generator for evaluation.
Exactly one of prompt or prompt_path must be provided.
Template variables available in prompts:
| Variable | Description |
|---|---|
trace | Full trace object with all interactions |
trace.interactions | List of all interactions in order |
trace.last | Most recent interaction |
trace.last.inputs | Inputs from the most recent interaction |
trace.last.outputs | Outputs from the most recent interaction |
trace.last.metadata | Metadata from the most recent interaction |
from giskard.checks import LLMJudge
# Inline promptcheck = LLMJudge( prompt=""" Evaluate if the response is helpful and accurate.
User Input: {{ trace.last.inputs }} AI Response: {{ trace.last.outputs }}
Return passed=true if helpful and accurate, passed=false otherwise. """,)
# Multi-turn evaluationcheck = LLMJudge( prompt=""" Evaluate the multi-turn conversation quality.
{% for interaction in trace.interactions %} User: {{ interaction.inputs }} Assistant: {{ interaction.outputs }} {% endfor %}
Criteria: consistency, relevance, professional tone. Return passed=true if all criteria are met. """,)SemanticSimilarity
Section titled “SemanticSimilarity”Validate semantic similarity between outputs and expected content using embeddings.
Module: giskard.checks.builtin.semantic_similarity
reference_text str | None Reference text to compare against (static).
reference_text_key str Default: "trace.last.metadata.reference_text" JSONPath to extract reference text from trace.
actual_answer_key str Default: "trace.last.outputs" JSONPath to extract actual value from trace.
threshold float Default: 0.95 Similarity threshold (0.0 to 1.0).
embedding_model BaseEmbeddingModel Default: get_default_embedding_model() Embedding model used to compute similarity scores.
from giskard.checks import SemanticSimilarity
check = SemanticSimilarity( reference_text="The capital of France is Paris.", actual_answer_key="trace.last.outputs", threshold=0.8,)Common patterns
Section titled “Common patterns”Combining multiple checks
Section titled “Combining multiple checks”Chain checks on a scenario, or wrap them with AllOf, AnyOf, and Not (see Check Composition).
from giskard.checks import Groundedness, Conformity, LLMJudge, Scenario
scenario = ( Scenario() .interact( inputs="What is the capital of France?", outputs=lambda inputs: "Paris is the capital of France.", ) .check( Groundedness( context=["France is a country in Europe.", "Paris is the capital."] ) ) .check(Conformity(rule="The response must be a complete sentence")) .check( LLMJudge( prompt="Is the response educational? Return passed=true/false." ) ))Reusing generators
Section titled “Reusing generators”from giskard.agents.generators import Generatorfrom giskard.checks import set_default_generator
# Set once, use everywhereset_default_generator(Generator(model="openai/gpt-5", temperature=0.1))
# No need to pass generator anymorecheck1 = Groundedness(answer="...", context=["..."])check2 = Conformity(rule="...")check3 = LLMJudge(prompt="...")Creating custom checks
Section titled “Creating custom checks”from giskard.checks import Check, CheckResult, Trace
@Check.register("custom_business_logic")class CustomBusinessCheck(Check): threshold: float = 0.9 allowed_categories: list[str] = []
async def run(self, trace: Trace) -> CheckResult: output = trace.last.outputs category = output.get("category") confidence = output.get("confidence", 0)
if category not in self.allowed_categories: return CheckResult.failure( message=f"Invalid category: {category}", details={ "category": category, "allowed": self.allowed_categories, }, )
if confidence < self.threshold: return CheckResult.failure( message=f"Confidence {confidence} below threshold {self.threshold}", )
return CheckResult.success(message="Validation passed")
check = CustomBusinessCheck( threshold=0.85, allowed_categories=["sports", "news"])