Custom Checks

Build domain-specific checks that go beyond the built-in library — from simple predicate functions to stateful LLM judges.

Quick check with `FnCheck`

FnCheck wraps any boolean function into a named check. Use it when the logic fits in one expression.

from giskard.checks import FnCheck, Scenario

is_short = FnCheck(
    fn=lambda trace: len(trace.last.outputs) < 200,
    name="response_is_concise",
    success_message="Response is concise",
    failure_message="Response is too long",
)

scenario = (
    Scenario("concise_reply")
    .interact(inputs="Summarize in one sentence.", outputs=lambda inputs: my_llm(inputs))
    .check(is_short)
)

For anything more complex, define a named function:

def no_placeholder_text(trace) -> bool:
    output = trace.last.outputs
    return "[INSERT" not in output and "TODO" not in output


scenario = scenario.check(
    FnCheck(
        fn=no_placeholder_text,
        name="no_placeholders",
        success_message="No placeholder text",
        failure_message="Response contains placeholder text",
    )
)

Check subclass

Subclass Check when you need configurable parameters, reuse across scenarios, or a clean import path.

from giskard.checks import Check, CheckResult, Trace
from pydantic import Field


@Check.register("contains_keyword")
class ContainsKeyword(Check):
    keyword: str = Field(
        ..., description="Keyword that must appear in the output"
    )
    case_sensitive: bool = Field(default=False)

    async def run(self, trace: Trace) -> CheckResult:
        output = trace.last.outputs
        target = output if self.case_sensitive else output.lower()
        needle = self.keyword if self.case_sensitive else self.keyword.lower()
        passed = needle in target
        if passed:
            return CheckResult.success(message=f"Found '{self.keyword}'")
        return CheckResult.failure(message=f"Missing '{self.keyword}'")

Instantiate it like any built-in check:

scenario = scenario.check(ContainsKeyword(name="mentions_price", keyword="price"))

@Check.register("contains_keyword") is optional but recommended. It registers the class under a stable string key that is used when serializing and deserializing scenarios and test suites. Without it, serialization falls back to the fully-qualified class name, which breaks if you rename or move the class.

Reading values from the trace with `resolve`

Use resolve(trace, key) to extract values from the trace using dot-notation paths — the same paths used by Equals, Groundedness, and other built-ins.

from giskard.checks import Check, CheckResult, Trace
from giskard.checks.core.extraction import resolve
from pydantic import Field


class MaxTokens(Check):
    key: str = Field(default="trace.last.outputs")
    limit: int = Field(default=500)

    async def run(self, trace: Trace) -> CheckResult:
        value = resolve(trace, self.key)
        token_count = len(str(value).split())
        passed = token_count <= self.limit
        msg = f"{token_count} tokens ({'ok' if passed else f'exceeds limit of {self.limit}'})"
        if passed:
            return CheckResult.success(message=msg)
        return CheckResult.failure(message=msg)

LLM-backed check with `BaseLLMCheck`

BaseLLMCheck handles generator setup and prompt rendering. Override get_prompt and let the base class call the LLM and parse the passed: true/false response.

from giskard.checks import BaseLLMCheck
from pydantic import Field


class ToneCheck(BaseLLMCheck):
    tone: str = Field(
        ..., description="Expected tone, e.g. 'professional', 'empathetic'"
    )

    def get_prompt(self) -> str:
        return f"""
        Evaluate whether the following response has a {self.tone} tone.

        Response: {{{{ trace.last.outputs }}}}

        Return 'passed: true' if the tone is {self.tone}, 'passed: false' otherwise.
        Include a brief explanation.
        """

Use it like any other check:

scenario = scenario.check(ToneCheck(name="professional_tone", tone="professional"))

By default BaseLLMCheck expects the LLM to return a JSON object with the shape {"reason": str | None, "passed": bool}. You can change this by overriding output_type (a Pydantic model) and _handle_output. See the BaseLLMCheck API reference for details.

Async checks

All Check.run() methods are async, so you can call external services without blocking the event loop.

import httpx
from giskard.checks import Check, CheckResult, Trace


class ToxicityAPICheck(Check):
    api_url: str

    async def run(self, trace: Trace) -> CheckResult:
        async with httpx.AsyncClient() as client:
            response = await client.post(
                self.api_url,
                json={"text": trace.last.outputs},
            )
        score = response.json()["toxicity_score"]
        passed = score < 0.5
        if passed:
            return CheckResult.success(message=f"Toxicity score: {score:.2f}")
        return CheckResult.failure(message=f"Toxicity score: {score:.2f}")

Composing checks

Group related checks into a helper function that returns a list, then pass them to .check() with the variadic form. Checks run sequentially — the scenario stops at the first failure, so order matters. Put cheap, fast checks before expensive LLM-based judges.

from giskard.checks import FnCheck


def safety_checks():
    return [
        FnCheck(
            fn=lambda trace: len(trace.last.outputs) > 0,
            name="non_empty",
            success_message="Response is non-empty",
            failure_message="Empty response",
        ),
        FnCheck(
            fn=lambda trace: "error" not in trace.last.outputs.lower(),
            name="no_error_string",
            success_message="No error string",
            failure_message="Response contains 'error'",
        ),
        ContainsKeyword(name="has_disclaimer", keyword="disclaimer"),
    ]


scenario = Scenario("safe_reply").interact(
    inputs="Tell me about investing.",
    outputs=lambda inputs: my_llm(inputs),
)
for chk in safety_checks():
    scenario.check(chk)

Testing your custom check

Test the check logic in isolation before wiring it into a scenario.

import asyncio
from giskard.checks import Trace, Interaction


async def test_contains_keyword():
    trace = Trace(
        interactions=[
            Interaction(
                inputs="What is the price?", outputs="The price is $99."
            )
        ]
    )
    check = ContainsKeyword(name="mentions_price", keyword="price")
    result = await check.run(trace)
    print(f"Check result: {result.message}")
    assert result.passed
    assert "price" in result.message.lower()


asyncio.run(test_contains_keyword())

Output

Check result: Found ‘price’

Next steps

API Reference: Checks — full list of built-in checks and their parameters
Single-Turn Evaluation — using checks in a scenario
Stateful Checks — checks that accumulate state across interactions