Checks

Ready-to-use validation checks for common testing scenarios, including function-based checks, string matching, comparisons, policy evaluation, check composition, JSON validation, and LLM-powered semantic validation.

Function-based Checks

`FnCheck`

Create a check from a callable function. Perfect for quick prototyping and simple validation logic. Not reliably serializable — intended for programmatic/test use only.

Module: giskard.checks.builtin.fn

fn Callable Required

Function taking a Trace, returning bool or CheckResult.

name str | None Default: None

Optional check name.

description str | None Default: None

Optional description.

success_message str | None Default: None

Message when check passes.

failure_message str | None Default: None

Message when check fails.

details dict[str, Any] Default: {}

Additional details to include in result.

from giskard.checks import FnCheck

# Simple boolean check
check = FnCheck(
    fn=lambda trace: trace.last.outputs is not None,
    name="has_output",
    success_message="Output was provided",
    failure_message="No output found",
)


# Async check
async def validate_response(trace):
    is_valid = await external_validator(trace.last.outputs)
    return is_valid


check = FnCheck(fn=validate_response, name="async_validation")

String Matching

`StringMatching`

Check that validates string patterns (substring matching) in trace values.

Module: giskard.checks.builtin.text_matching

keyword str | None

Substring to search for (or use keyword_key to extract from trace).

keyword_key str | None

JSONPath to extract keyword from trace.

text str | None

JSONPath-resolved text to search in. If unset, falls back to text_key.

text_key str Default: "trace.last.outputs"

JSONPath to extract text to search in.

normalization_form Literal["NFC", "NFD", "NFKC", "NFKD"] | None Default: "NFKC"

Unicode normalization applied before matching.

case_sensitive bool Default: True

Whether matching is case-sensitive.

from giskard.checks import StringMatching

check = StringMatching(keyword="success", text_key="trace.last.outputs")

# Case-insensitive
check = StringMatching(
    keyword="error", text_key="trace.last.outputs", case_sensitive=False
)

`RegexMatching`

Check with regex pattern matching.

Module: giskard.checks.builtin.text_matching

pattern str | None

Regular expression pattern.

pattern_key str | None

JSONPath to extract the regex pattern from the trace. Provide exactly one of pattern or pattern_key.

text str | None

JSONPath-resolved text to match against (alternative to text_key).

text_key str Default: "trace.last.outputs"

JSONPath to extract text to match against.

match_timeout_seconds float Default: 1.0

Upper bound on how long regex matching may take before the check errors.

from giskard.checks import RegexMatching

check = RegexMatching(
    pattern=r"\d{3}-\d{3}-\d{4}",
    text_key="trace.last.outputs.phone",
)

Comparison Checks

Validate numeric and comparable values against expected thresholds.

Module: giskard.checks.builtin.comparison

All comparison checks share these parameters:

expected_value Any | None

Static expected value.

expected_value_key JSONPathStr | NotProvided

JSONPath to extract expected value from trace.

key str Required

JSONPath to extract the actual value from the trace.

normalization_form str | None Default: "NFKC"

Unicode normalization: "NFC", "NFD", "NFKC", "NFKD".

Provide exactly one of expected_value or expected_value_key.

`Equals`

Check that extracted values equal an expected value.

from giskard.checks import Equals

check = Equals(expected_value=42, key="trace.last.outputs.count")
check = Equals(expected_value="success", key="trace.last.outputs.status")

# Compare against another trace value
check = Equals(
    expected_value_key="trace.interactions[0].outputs.baseline",
    key="trace.last.outputs.result",
)

`NotEquals`

Check that extracted values do not equal an expected value.

from giskard.checks import NotEquals

check = NotEquals(expected_value="error", key="trace.last.outputs.status")

`GreaterThan` / `GreaterEquals`

from giskard.checks import GreaterThan, GreaterEquals

check = GreaterThan(
    expected_value=0.8, key="trace.last.metadata.confidence_score"
)
check = GreaterEquals(expected_value=100, key="trace.last.outputs.user_count")

`LesserThan` / `LesserThanEquals`

from giskard.checks import LesserThan, LesserThanEquals

check = LesserThan(expected_value=500, key="trace.last.metadata.latency_ms")
check = LesserThanEquals(
    expected_value=1000, key="trace.last.metadata.token_count"
)

JSON Validation

`JsonValid`

Check that a trace value is valid JSON. Accepts a JSON string or an already-parsed JSON-compatible value (dict, list, string, number, boolean, or None). Optionally validates against a JSON Schema.

Module: giskard.checks.builtin.json_valid

key str Default: "trace.last.outputs"

JSONPath expression to extract the value to validate.

expected_schema dict[str, Any] | None Default: None

Optional JSON Schema for the parsed value. Serialized as schema in JSON.

from giskard.checks import JsonValid

check = JsonValid(key="trace.last.outputs")

check = JsonValid(
    key="trace.last.outputs",
    expected_schema={
        "type": "object",
        "required": ["answer"],
        "properties": {"answer": {"type": "string"}},
    },
)

Policy Checks

`RegoPolicy`

Evaluate an inline Rego policy against trace data using Regorus, an OPA-compatible Rego engine. Extracts the value passed as input from the trace via JSONPath, merges optional static data, and evaluates a boolean rule.

Module: giskard.checks.builtin.rego_policy

policy str Required

Inline Rego source loaded into the engine.

rule str Required

Fully qualified boolean rule path (e.g. data.giskard.allow). Must evaluate to a boolean, be undefined (fail), or error on other types.

key str Default: "trace.last.outputs"

JSONPath into the trace for the JSON value exposed to the policy as input.

data dict[str, Any] Default: {}

Static data document merged into the policy engine via engine.add_data (separate from input).

from giskard.checks import RegoPolicy

check = RegoPolicy(
    policy="""
package giskard

default allow = false

allow if {
    input.role == "admin"
}
""",
    rule="data.giskard.allow",
)

Check Composition

Combine built-in or custom checks with logical operators. All composition checks are in giskard.checks.builtin.composition.

`AllOf`

Passes only when all inner checks pass. Short-circuits on the first failure or error.

checks list[Check] Required

Ordered list of checks to evaluate. All must pass.

from giskard.checks import AllOf, LesserThan, Equals

check = AllOf(
    checks=[
        LesserThan(expected_value=10, key="trace.last.outputs"),
        Equals(expected_value=5, key="trace.last.outputs"),
    ]
)

`AnyOf`

Passes when at least one inner check passes. Short-circuits on the first pass.

checks list[Check] Required

Ordered list of checks to evaluate. At least one must pass.

from giskard.checks import AnyOf, StringMatching

check = AnyOf(
    checks=[
        StringMatching(keyword="yes", key="trace.last.outputs"),
        StringMatching(keyword="approved", key="trace.last.outputs"),
    ]
)

`Not`

Inverts the result of an inner check. Pass becomes fail and fail becomes pass. Error and skip results pass through unchanged.

check Check Required

The inner check whose result will be inverted.

from giskard.checks import Not, StringMatching

check = Not(
    check=StringMatching(keyword="forbidden", key="trace.last.outputs")
)

LLM-based Checks

Validation checks powered by Large Language Models for semantic understanding.

`BaseLLMCheck`

Abstract base class for creating custom LLM-powered checks. Handles LLM interaction, prompt rendering, and result parsing — subclasses only need to define the evaluation prompt.

Module: giskard.checks.judges.base

BaseLLMCheck

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation. Falls back to the global default if not provided.

name str | None Default: None

Optional check name.

description str | None Default: None

Optional description.

.get_prompt() → str | Message | MessageTemplate | TemplateReference

Returns the prompt to send to the LLM. Subclasses must implement this method.

.get_inputs() → dict[str, Any]

Provides template variables for prompt rendering. Override to customize available variables. Default: {"trace": trace}.

trace Trace Required

The trace containing interaction history.

.run() → CheckResult

Execute the LLM-based check (inherited, usually doesn’t need overriding).

trace Trace Required

The trace to evaluate.

from giskard.checks.judges.base import BaseLLMCheck


@BaseLLMCheck.register("custom_llm_check")
class CustomLLMCheck(BaseLLMCheck):
    custom_instruction: str

    def get_prompt(self):
        return f"""
        Evaluate based on: {self.custom_instruction}

        Input: {{{{ trace.last.inputs }}}}
        Output: {{{{ trace.last.outputs }}}}

        Return passed=true if criteria are met, passed=false otherwise.
        """


check = CustomLLMCheck(
    custom_instruction="Response must be concise and helpful"
)

`LLMCheckResult`

Module: giskard.checks.judges.base

Default result model for LLM-based checks. This is the structured output format expected from the LLM.

LLMCheckResult

passed bool

Whether the check passed.

reason str | None

Optional explanation for the result.

`Groundedness`

Validates that answers are grounded in provided context documents. Essential for RAG systems to ensure responses don't hallucinate information.

Module: giskard.checks.judges.groundedness

answer str | None

The answer text to evaluate (static).

answer_key str Default: "trace.last.outputs"

JSONPath to extract answer from trace.

context str | list[str] | None

Context document(s) that should support the answer (static).

context_key str Default: "trace.last.metadata.context"

JSONPath to extract context from trace.

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation.

from giskard.checks import Groundedness

# Static values
check = Groundedness(
    answer="The Eiffel Tower is in Paris.",
    context=[
        "Paris is the capital of France.",
        "The Eiffel Tower is a famous landmark.",
    ],
)

# Extract from trace
check = Groundedness(
    answer_key="trace.last.outputs.answer",
    context_key="trace.last.metadata.retrieved_docs",
)

`AnswerRelevance`

LLM-based check that evaluates whether the model's answer is relevant to the question. Uses conversation history for context but scores only the current turn.

Module: giskard.checks.judges.answer_relevance

question str | None Default: None

Question to evaluate against. Takes priority over question_key when set.

question_key str Default: "trace.last.inputs"

JSONPath to extract the question from the trace.

answer str | None Default: None

Answer to evaluate. Takes priority over answer_key when set.

answer_key str Default: "trace.last.outputs"

JSONPath to extract the answer from the trace.

context str | None Default: None

Optional domain context describing the chatbot’s purpose (not extracted from the trace).

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation (falls back to default).

from giskard.checks import AnswerRelevance, Scenario

scenario = (
    Scenario(name="rag_relevance_multi_turn")
    .interact(inputs="What is the best language?", outputs="Python")
    .interact(inputs="What's Python?", outputs="A snake.")
    .check(AnswerRelevance())
)

`Toxicity`

LLM-based check that detects toxic, harmful, or offensive content across configurable categories.

Module: giskard.checks.judges.toxicity

output str | None Default: None

Text to evaluate. If omitted, extracted from the trace using output_key.

output_key str Default: "trace.last.outputs"

JSONPath to extract the output from the trace.

categories list[str] Default: ["hate_speech", "harassment", "threats", "self_harm", "sexual_content", "violence"]

Toxicity categories to evaluate. Restrict the list to focus the judge.

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation (falls back to default).

from giskard.checks import Toxicity, Scenario

scenario = (
    Scenario(name="safety_check")
    .interact(inputs="Tell me a joke", outputs="Here is a clean joke: ...")
    .check(Toxicity())
)

check = Toxicity(
    output="This is a safe response.",
    categories=["hate_speech", "harassment"],
)

`Conformity`

Validates that a trace conforms to a given rule. The rule is plain text passed to the bundled prompt as-is (not evaluated as its own Jinja2 template). The full trace is supplied to the model for judgment.

Module: giskard.checks.judges.conformity

rule str Required

The rule statement to evaluate against the trace (literal text).

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation (falls back to default).

from giskard.checks import Conformity

check = Conformity(rule="The response must be professional and polite")
check = Conformity(rule="The last response should be polite.")

`LLMJudge`

General-purpose LLM-based validation with custom prompts. The most flexible LLM check — use when specialized checks (Groundedness, Conformity) don't fit your needs.

Module: giskard.checks.judges.judge

prompt str | None

Inline prompt content with Jinja2 templating support.

prompt_path str | None

Path to a template file (e.g. "checks::my_template.j2").

generator BaseGenerator Default: get_default_generator()

LLM generator for evaluation.

Exactly one of prompt or prompt_path must be provided.

Template variables available in prompts:

Variable	Description
`trace`	Full trace object with all interactions
`trace.interactions`	List of all interactions in order
`trace.last`	Most recent interaction
`trace.last.inputs`	Inputs from the most recent interaction
`trace.last.outputs`	Outputs from the most recent interaction
`trace.last.metadata`	Metadata from the most recent interaction

from giskard.checks import LLMJudge

# Inline prompt
check = LLMJudge(
    prompt="""
    Evaluate if the response is helpful and accurate.

    User Input: {{ trace.last.inputs }}
    AI Response: {{ trace.last.outputs }}

    Return passed=true if helpful and accurate, passed=false otherwise.
    """,
)

# Multi-turn evaluation
check = LLMJudge(
    prompt="""
    Evaluate the multi-turn conversation quality.

    {% for interaction in trace.interactions %}
    User: {{ interaction.inputs }}
    Assistant: {{ interaction.outputs }}
    {% endfor %}

    Criteria: consistency, relevance, professional tone.
    Return passed=true if all criteria are met.
    """,
)

`SemanticSimilarity`

Validate semantic similarity between outputs and expected content using embeddings.

Module: giskard.checks.builtin.semantic_similarity

reference_text str | None

Reference text to compare against (static).

reference_text_key str Default: "trace.last.metadata.reference_text"

JSONPath to extract reference text from trace.

actual_answer_key str Default: "trace.last.outputs"

JSONPath to extract actual value from trace.

threshold float Default: 0.95

Similarity threshold (0.0 to 1.0).

embedding_model BaseEmbeddingModel Default: get_default_embedding_model()

Embedding model used to compute similarity scores.

from giskard.checks import SemanticSimilarity

check = SemanticSimilarity(
    reference_text="The capital of France is Paris.",
    actual_answer_key="trace.last.outputs",
    threshold=0.8,
)

Common patterns

Combining multiple checks

Chain checks on a scenario, or wrap them with AllOf, AnyOf, and Not (see Check Composition).

from giskard.checks import Groundedness, Conformity, LLMJudge, Scenario

scenario = (
    Scenario()
    .interact(
        inputs="What is the capital of France?",
        outputs=lambda inputs: "Paris is the capital of France.",
    )
    .check(
        Groundedness(
            context=["France is a country in Europe.", "Paris is the capital."]
        )
    )
    .check(Conformity(rule="The response must be a complete sentence"))
    .check(
        LLMJudge(
            prompt="Is the response educational? Return passed=true/false."
        )
    )
)

Reusing generators

from giskard.agents.generators import Generator
from giskard.checks import set_default_generator

# Set once, use everywhere
set_default_generator(Generator(model="openai/gpt-5", temperature=0.1))

# No need to pass generator anymore
check1 = Groundedness(answer="...", context=["..."])
check2 = Conformity(rule="...")
check3 = LLMJudge(prompt="...")

Creating custom checks

from giskard.checks import Check, CheckResult, Trace


@Check.register("custom_business_logic")
class CustomBusinessCheck(Check):
    threshold: float = 0.9
    allowed_categories: list[str] = []

    async def run(self, trace: Trace) -> CheckResult:
        output = trace.last.outputs
        category = output.get("category")
        confidence = output.get("confidence", 0)

        if category not in self.allowed_categories:
            return CheckResult.failure(
                message=f"Invalid category: {category}",
                details={
                    "category": category,
                    "allowed": self.allowed_categories,
                },
            )

        if confidence < self.threshold:
            return CheckResult.failure(
                message=f"Confidence {confidence} below threshold {self.threshold}",
            )

        return CheckResult.success(message="Validation passed")


check = CustomBusinessCheck(
    threshold=0.85, allowed_categories=["sports", "news"]
)

Checks

Function-based Checks

FnCheck

String Matching

StringMatching

RegexMatching

Comparison Checks

Equals

NotEquals

GreaterThan / GreaterEquals

LesserThan / LesserThanEquals

JSON Validation

JsonValid

Policy Checks

RegoPolicy

Check Composition

AllOf

AnyOf

Not

LLM-based Checks

BaseLLMCheck

LLMCheckResult

Groundedness

AnswerRelevance

Toxicity

Conformity

LLMJudge

SemanticSimilarity

Common patterns

Combining multiple checks

Reusing generators

Creating custom checks

See also

`FnCheck`

`StringMatching`

`RegexMatching`

`Equals`

`NotEquals`

`GreaterThan` / `GreaterEquals`

`LesserThan` / `LesserThanEquals`

`JsonValid`

`RegoPolicy`

`AllOf`

`AnyOf`

`Not`

`BaseLLMCheck`

`LLMCheckResult`

`Groundedness`

`AnswerRelevance`

`Toxicity`

`Conformity`

`LLMJudge`

`SemanticSimilarity`