Checks
Ready-to-use validation checks for common testing scenarios, including function-based checks, string matching, comparisons, and LLM-powered semantic validation.
Function-based Checks
Section titled “Function-based Checks”FnCheck
Section titled “FnCheck”Create a check from a callable function. Perfect for quick prototyping and simple validation logic. Not reliably serializable — intended for programmatic/test use only.
Module: giskard.checks.builtin.fn
fn Callable Required Function taking a Trace, returning bool or CheckResult.
name str | None Default: None Optional check name.
description str | None Default: None Optional description.
success_message str | None Default: None Message when check passes.
failure_message str | None Default: None Message when check fails.
details dict | None Default: None Additional details to include in result.
from giskard.checks import FnCheck
# Simple boolean checkcheck = FnCheck( fn=lambda trace: trace.last.outputs is not None, name="has_output", success_message="Output was provided", failure_message="No output found",)
# Async checkasync def validate_response(trace): is_valid = await external_validator(trace.last.outputs) return is_valid
check = FnCheck(fn=validate_response, name="async_validation")String Matching
Section titled “String Matching”StringMatching
Section titled “StringMatching”Check that validates string patterns (substring matching) in trace values.
Module: giskard.checks.builtin.text_matching
keyword str Substring to search for (or use keyword_key to extract from trace).
keyword_key str JSONPath to extract keyword from trace.
text_key str Default: "trace.last.outputs" JSONPath to extract text to search in.
case_sensitive bool Default: True Whether matching is case-sensitive.
from giskard.checks import StringMatching
check = StringMatching(keyword="success", text_key="trace.last.outputs")
# Case-insensitivecheck = StringMatching( keyword="error", text_key="trace.last.outputs", case_sensitive=False)RegexMatching
Section titled “RegexMatching”Check with regex pattern matching.
Module: giskard.checks.builtin.text_matching
pattern str Required Regular expression pattern.
text_key str Default: "trace.last.outputs" JSONPath to extract text to match against.
from giskard.checks import RegexMatching
check = RegexMatching( pattern=r"\d{3}-\d{3}-\d{4}", text_key="trace.last.outputs.phone",)Comparison Checks
Section titled “Comparison Checks”Validate numeric and comparable values against expected thresholds.
Module: giskard.checks.builtin.comparison
All comparison checks share these parameters:
expected_value Any | None Static expected value.
expected_value_key str | None JSONPath to extract expected value from trace.
key str Required JSONPath to extract the actual value from the trace.
normalization_form str | None Unicode normalization: "NFC", "NFD", "NFKC", "NFKD".
Provide exactly one of expected_value or expected_value_key.
Equals
Section titled “Equals”Check that extracted values equal an expected value.
from giskard.checks import Equals
check = Equals(expected_value=42, key="trace.last.outputs.count")check = Equals(expected_value="success", key="trace.last.outputs.status")
# Compare against another trace valuecheck = Equals( expected_value_key="trace.interactions[0].outputs.baseline", key="trace.last.outputs.result",)NotEquals
Section titled “NotEquals”Check that extracted values do not equal an expected value.
from giskard.checks import NotEquals
check = NotEquals(expected_value="error", key="trace.last.outputs.status")GreaterThan / GreaterEquals
Section titled “GreaterThan / GreaterEquals”from giskard.checks import GreaterThan, GreaterEquals
check = GreaterThan( expected_value=0.8, key="trace.last.metadata.confidence_score")check = GreaterEquals(expected_value=100, key="trace.last.outputs.user_count")LesserThan / LesserThanEquals
Section titled “LesserThan / LesserThanEquals”from giskard.checks import LesserThan, LesserThanEquals
check = LesserThan(expected_value=500, key="trace.last.metadata.latency_ms")check = LesserThanEquals( expected_value=1000, key="trace.last.metadata.token_count")LLM-based Checks
Section titled “LLM-based Checks”Validation checks powered by Large Language Models for semantic understanding.
BaseLLMCheck
Section titled “BaseLLMCheck”Abstract base class for creating custom LLM-powered checks. Handles LLM interaction, prompt rendering, and result parsing — subclasses only need to define the evaluation prompt.
Module: giskard.checks.judges.base
generator BaseGenerator | None Default: None LLM generator for evaluation. Falls back to the global default if not provided.
name str | None Default: None Optional check name.
description str | None Default: None Optional description.
.get_prompt() → str | Message | MessageTemplate | TemplateReference Returns the prompt to send to the LLM. Subclasses must implement this method.
.get_inputs() → dict[str, Any] Provides template variables for prompt rendering. Override to customize available variables. Default: {"trace": trace}.
trace Trace Required .run() → CheckResult Execute the LLM-based check (inherited, usually doesn’t need overriding).
trace Trace Required from giskard.checks.judges.base import BaseLLMCheck
@BaseLLMCheck.register("custom_llm_check")class CustomLLMCheck(BaseLLMCheck): custom_instruction: str
def get_prompt(self): return f""" Evaluate based on: {self.custom_instruction}
Input: {{{{ trace.last.inputs }}}} Output: {{{{ trace.last.outputs }}}}
Return passed=true if criteria are met, passed=false otherwise. """
check = CustomLLMCheck( custom_instruction="Response must be concise and helpful")LLMCheckResult
Section titled “LLMCheckResult”Module: giskard.checks.judges.base
Default result model for LLM-based checks. This is the structured output format expected from the LLM.
passed bool Whether the check passed.
reason str | None Optional explanation for the result.
Groundedness
Section titled “Groundedness”Validates that answers are grounded in provided context documents. Essential for RAG systems to ensure responses don't hallucinate information.
Module: giskard.checks.judges.groundedness
answer str | None The answer text to evaluate (static).
answer_key str Default: "trace.last.outputs" JSONPath to extract answer from trace.
context str | list[str] | None Context document(s) that should support the answer (static).
context_key str Default: "trace.last.metadata.context" JSONPath to extract context from trace.
generator BaseGenerator | None LLM generator for evaluation.
from giskard.checks import Groundedness
# Static valuescheck = Groundedness( answer="The Eiffel Tower is in Paris.", context=[ "Paris is the capital of France.", "The Eiffel Tower is a famous landmark.", ],)
# Extract from tracecheck = Groundedness( answer_key="trace.last.outputs.answer", context_key="trace.last.metadata.retrieved_docs",)Conformity
Section titled “Conformity”Validates that interactions conform to a specified rule or requirement. The rule supports Jinja2 templating, allowing dynamic rules that reference trace data.
Module: giskard.checks.judges.conformity
rule str Required The conformity rule to evaluate. Supports Jinja2 templating with access to the
trace object.
generator BaseGenerator | None LLM generator for evaluation (falls back to default).
from giskard.checks import Conformity
# Static rulecheck = Conformity(rule="The response must be professional and polite")
# Dynamic rule with templatingcheck = Conformity( rule="The response must contain the keywords '{{ trace.last.inputs.required_keywords }}'")
# Reference metadatacheck = Conformity( rule="Use a {{ trace.last.metadata.tone }} tone in the response")LLMJudge
Section titled “LLMJudge”General-purpose LLM-based validation with custom prompts. The most flexible LLM check — use when specialized checks (Groundedness, Conformity) don't fit your needs.
Module: giskard.checks.judges.judge
prompt str | None Inline prompt content with Jinja2 templating support.
prompt_path str | None Path to a template file (e.g. "checks::my_template.j2").
generator BaseGenerator | None LLM generator for evaluation.
Exactly one of prompt or prompt_path must be provided.
Template variables available in prompts:
| Variable | Description |
|---|---|
trace | Full trace object with all interactions |
trace.interactions | List of all interactions in order |
trace.last | Most recent interaction |
trace.last.inputs | Inputs from the most recent interaction |
trace.last.outputs | Outputs from the most recent interaction |
trace.last.metadata | Metadata from the most recent interaction |
from giskard.checks import LLMJudge
# Inline promptcheck = LLMJudge( prompt=""" Evaluate if the response is helpful and accurate.
User Input: {{ trace.last.inputs }} AI Response: {{ trace.last.outputs }}
Return passed=true if helpful and accurate, passed=false otherwise. """,)
# Multi-turn evaluationcheck = LLMJudge( prompt=""" Evaluate the multi-turn conversation quality.
{% for interaction in trace.interactions %} User: {{ interaction.inputs }} Assistant: {{ interaction.outputs }} {% endfor %}
Criteria: consistency, relevance, professional tone. Return passed=true if all criteria are met. """,)SemanticSimilarity
Section titled “SemanticSimilarity”Validate semantic similarity between outputs and expected content using embeddings.
Module: giskard.checks.builtin.semantic_similarity
reference_text str | None Reference text to compare against (static).
reference_text_key str Default: "trace.last.metadata.reference_text" JSONPath to extract reference text from trace.
actual_answer_key str Default: "trace.last.outputs" JSONPath to extract actual value from trace.
threshold float Default: 0.8 Similarity threshold (0.0 to 1.0).
generator BaseGenerator | None LLM generator for evaluation.
from giskard.checks import SemanticSimilarity
check = SemanticSimilarity( reference_text="The capital of France is Paris.", actual_answer_key="trace.last.outputs", threshold=0.8,)Common patterns
Section titled “Common patterns”Combining multiple checks
Section titled “Combining multiple checks”from giskard.checks import Groundedness, Conformity, LLMJudge, Scenario
scenario = ( Scenario() .interact( inputs="What is the capital of France?", outputs=lambda inputs: "Paris is the capital of France.", ) .check( Groundedness( context=["France is a country in Europe.", "Paris is the capital."] ) ) .check(Conformity(rule="The response must be a complete sentence")) .check( LLMJudge( prompt="Is the response educational? Return passed=true/false." ) ))Reusing generators
Section titled “Reusing generators”from giskard.agents.generators import Generatorfrom giskard.checks import set_default_generator
# Set once, use everywhereset_default_generator(Generator(model="openai/gpt-5", temperature=0.1))
# No need to pass generator anymorecheck1 = Groundedness(answer="...", context=["..."])check2 = Conformity(rule="...")check3 = LLMJudge(prompt="...")Creating custom checks
Section titled “Creating custom checks”from giskard.checks import Check, CheckResult, Trace
@Check.register("custom_business_logic")class CustomBusinessCheck(Check): threshold: float = 0.9 allowed_categories: list[str] = []
async def run(self, trace: Trace) -> CheckResult: output = trace.last.outputs category = output.get("category") confidence = output.get("confidence", 0)
if category not in self.allowed_categories: return CheckResult.failure( message=f"Invalid category: {category}", details={ "category": category, "allowed": self.allowed_categories, }, )
if confidence < self.threshold: return CheckResult.failure( message=f"Confidence {confidence} below threshold {self.threshold}", )
return CheckResult.success(message="Validation passed")
check = CustomBusinessCheck( threshold=0.85, allowed_categories=["sports", "news"])