Skip to content
GitHubDiscord

Checks

Ready-to-use validation checks for common testing scenarios, including function-based checks, string matching, comparisons, and LLM-powered semantic validation.


Create a check from a callable function. Perfect for quick prototyping and simple validation logic. Not reliably serializable — intended for programmatic/test use only.

Module: giskard.checks.builtin.fn

fn Callable Required

Function taking a Trace, returning bool or CheckResult.

name str | None Default: None

Optional check name.

description str | None Default: None

Optional description.

success_message str | None Default: None

Message when check passes.

failure_message str | None Default: None

Message when check fails.

details dict | None Default: None

Additional details to include in result.

from giskard.checks import FnCheck
# Simple boolean check
check = FnCheck(
fn=lambda trace: trace.last.outputs is not None,
name="has_output",
success_message="Output was provided",
failure_message="No output found",
)
# Async check
async def validate_response(trace):
is_valid = await external_validator(trace.last.outputs)
return is_valid
check = FnCheck(fn=validate_response, name="async_validation")

Check that validates string patterns (substring matching) in trace values.

Module: giskard.checks.builtin.text_matching

keyword str

Substring to search for (or use keyword_key to extract from trace).

keyword_key str

JSONPath to extract keyword from trace.

text_key str Default: "trace.last.outputs"

JSONPath to extract text to search in.

case_sensitive bool Default: True

Whether matching is case-sensitive.

from giskard.checks import StringMatching
check = StringMatching(keyword="success", text_key="trace.last.outputs")
# Case-insensitive
check = StringMatching(
keyword="error", text_key="trace.last.outputs", case_sensitive=False
)

Check with regex pattern matching.

Module: giskard.checks.builtin.text_matching

pattern str Required

Regular expression pattern.

text_key str Default: "trace.last.outputs"

JSONPath to extract text to match against.

from giskard.checks import RegexMatching
check = RegexMatching(
pattern=r"\d{3}-\d{3}-\d{4}",
text_key="trace.last.outputs.phone",
)

Validate numeric and comparable values against expected thresholds.

Module: giskard.checks.builtin.comparison

All comparison checks share these parameters:

expected_value Any | None

Static expected value.

expected_value_key str | None

JSONPath to extract expected value from trace.

key str Required

JSONPath to extract the actual value from the trace.

normalization_form str | None

Unicode normalization: "NFC", "NFD", "NFKC", "NFKD".

Provide exactly one of expected_value or expected_value_key.

Check that extracted values equal an expected value.

from giskard.checks import Equals
check = Equals(expected_value=42, key="trace.last.outputs.count")
check = Equals(expected_value="success", key="trace.last.outputs.status")
# Compare against another trace value
check = Equals(
expected_value_key="trace.interactions[0].outputs.baseline",
key="trace.last.outputs.result",
)

Check that extracted values do not equal an expected value.

from giskard.checks import NotEquals
check = NotEquals(expected_value="error", key="trace.last.outputs.status")
from giskard.checks import GreaterThan, GreaterEquals
check = GreaterThan(
expected_value=0.8, key="trace.last.metadata.confidence_score"
)
check = GreaterEquals(expected_value=100, key="trace.last.outputs.user_count")
from giskard.checks import LesserThan, LesserThanEquals
check = LesserThan(expected_value=500, key="trace.last.metadata.latency_ms")
check = LesserThanEquals(
expected_value=1000, key="trace.last.metadata.token_count"
)

Validation checks powered by Large Language Models for semantic understanding.

Abstract base class for creating custom LLM-powered checks. Handles LLM interaction, prompt rendering, and result parsing — subclasses only need to define the evaluation prompt.

Module: giskard.checks.judges.base

generator BaseGenerator | None Default: None

LLM generator for evaluation. Falls back to the global default if not provided.

name str | None Default: None

Optional check name.

description str | None Default: None

Optional description.

.get_prompt() str | Message | MessageTemplate | TemplateReference

Returns the prompt to send to the LLM. Subclasses must implement this method.

.get_inputs() dict[str, Any]

Provides template variables for prompt rendering. Override to customize available variables. Default: {"trace": trace}.

trace Trace Required
The trace containing interaction history.
.run() CheckResult

Execute the LLM-based check (inherited, usually doesn’t need overriding).

trace Trace Required
The trace to evaluate.
from giskard.checks.judges.base import BaseLLMCheck
@BaseLLMCheck.register("custom_llm_check")
class CustomLLMCheck(BaseLLMCheck):
custom_instruction: str
def get_prompt(self):
return f"""
Evaluate based on: {self.custom_instruction}
Input: {{{{ trace.last.inputs }}}}
Output: {{{{ trace.last.outputs }}}}
Return passed=true if criteria are met, passed=false otherwise.
"""
check = CustomLLMCheck(
custom_instruction="Response must be concise and helpful"
)

Module: giskard.checks.judges.base

Default result model for LLM-based checks. This is the structured output format expected from the LLM.

passed bool

Whether the check passed.

reason str | None

Optional explanation for the result.


Validates that answers are grounded in provided context documents. Essential for RAG systems to ensure responses don't hallucinate information.

Module: giskard.checks.judges.groundedness

answer str | None

The answer text to evaluate (static).

answer_key str Default: "trace.last.outputs"

JSONPath to extract answer from trace.

context str | list[str] | None

Context document(s) that should support the answer (static).

context_key str Default: "trace.last.metadata.context"

JSONPath to extract context from trace.

generator BaseGenerator | None

LLM generator for evaluation.

from giskard.checks import Groundedness
# Static values
check = Groundedness(
answer="The Eiffel Tower is in Paris.",
context=[
"Paris is the capital of France.",
"The Eiffel Tower is a famous landmark.",
],
)
# Extract from trace
check = Groundedness(
answer_key="trace.last.outputs.answer",
context_key="trace.last.metadata.retrieved_docs",
)

Validates that interactions conform to a specified rule or requirement. The rule supports Jinja2 templating, allowing dynamic rules that reference trace data.

Module: giskard.checks.judges.conformity

rule str Required

The conformity rule to evaluate. Supports Jinja2 templating with access to the trace object.

generator BaseGenerator | None

LLM generator for evaluation (falls back to default).

from giskard.checks import Conformity
# Static rule
check = Conformity(rule="The response must be professional and polite")
# Dynamic rule with templating
check = Conformity(
rule="The response must contain the keywords '{{ trace.last.inputs.required_keywords }}'"
)
# Reference metadata
check = Conformity(
rule="Use a {{ trace.last.metadata.tone }} tone in the response"
)

General-purpose LLM-based validation with custom prompts. The most flexible LLM check — use when specialized checks (Groundedness, Conformity) don't fit your needs.

Module: giskard.checks.judges.judge

prompt str | None

Inline prompt content with Jinja2 templating support.

prompt_path str | None

Path to a template file (e.g. "checks::my_template.j2").

generator BaseGenerator | None

LLM generator for evaluation.

Exactly one of prompt or prompt_path must be provided.

Template variables available in prompts:

VariableDescription
traceFull trace object with all interactions
trace.interactionsList of all interactions in order
trace.lastMost recent interaction
trace.last.inputsInputs from the most recent interaction
trace.last.outputsOutputs from the most recent interaction
trace.last.metadataMetadata from the most recent interaction
from giskard.checks import LLMJudge
# Inline prompt
check = LLMJudge(
prompt="""
Evaluate if the response is helpful and accurate.
User Input: {{ trace.last.inputs }}
AI Response: {{ trace.last.outputs }}
Return passed=true if helpful and accurate, passed=false otherwise.
""",
)
# Multi-turn evaluation
check = LLMJudge(
prompt="""
Evaluate the multi-turn conversation quality.
{% for interaction in trace.interactions %}
User: {{ interaction.inputs }}
Assistant: {{ interaction.outputs }}
{% endfor %}
Criteria: consistency, relevance, professional tone.
Return passed=true if all criteria are met.
""",
)

Validate semantic similarity between outputs and expected content using embeddings.

Module: giskard.checks.builtin.semantic_similarity

reference_text str | None

Reference text to compare against (static).

reference_text_key str Default: "trace.last.metadata.reference_text"

JSONPath to extract reference text from trace.

actual_answer_key str Default: "trace.last.outputs"

JSONPath to extract actual value from trace.

threshold float Default: 0.8

Similarity threshold (0.0 to 1.0).

generator BaseGenerator | None

LLM generator for evaluation.

from giskard.checks import SemanticSimilarity
check = SemanticSimilarity(
reference_text="The capital of France is Paris.",
actual_answer_key="trace.last.outputs",
threshold=0.8,
)

from giskard.checks import Groundedness, Conformity, LLMJudge, Scenario
scenario = (
Scenario()
.interact(
inputs="What is the capital of France?",
outputs=lambda inputs: "Paris is the capital of France.",
)
.check(
Groundedness(
context=["France is a country in Europe.", "Paris is the capital."]
)
)
.check(Conformity(rule="The response must be a complete sentence"))
.check(
LLMJudge(
prompt="Is the response educational? Return passed=true/false."
)
)
)
from giskard.agents.generators import Generator
from giskard.checks import set_default_generator
# Set once, use everywhere
set_default_generator(Generator(model="openai/gpt-5", temperature=0.1))
# No need to pass generator anymore
check1 = Groundedness(answer="...", context=["..."])
check2 = Conformity(rule="...")
check3 = LLMJudge(prompt="...")
from giskard.checks import Check, CheckResult, Trace
@Check.register("custom_business_logic")
class CustomBusinessCheck(Check):
threshold: float = 0.9
allowed_categories: list[str] = []
async def run(self, trace: Trace) -> CheckResult:
output = trace.last.outputs
category = output.get("category")
confidence = output.get("confidence", 0)
if category not in self.allowed_categories:
return CheckResult.failure(
message=f"Invalid category: {category}",
details={
"category": category,
"allowed": self.allowed_categories,
},
)
if confidence < self.threshold:
return CheckResult.failure(
message=f"Confidence {confidence} below threshold {self.threshold}",
)
return CheckResult.success(message="Validation passed")
check = CustomBusinessCheck(
threshold=0.85, allowed_categories=["sports", "news"]
)

  • Core API — Base classes and fundamental types
  • Scenarios — Multi-step workflow testing