Skip to content
GitHubDiscord

Custom Checks

Open In Colab

Build domain-specific checks that go beyond the built-in library — from simple predicate functions to stateful LLM judges.

FnCheck wraps any boolean function into a named check. Use it when the logic fits in one expression.

from giskard.checks import FnCheck, Scenario
is_short = FnCheck(
fn=lambda trace: len(trace.last.outputs) < 200,
name="response_is_concise",
success_message="Response is concise",
failure_message="Response is too long",
)
scenario = (
Scenario("concise_reply")
.interact(inputs="Summarize in one sentence.", outputs=lambda inputs: my_llm(inputs))
.check(is_short)
)

For anything more complex, define a named function:

def no_placeholder_text(trace) -> bool:
output = trace.last.outputs
return "[INSERT" not in output and "TODO" not in output
scenario = scenario.check(
FnCheck(
fn=no_placeholder_text,
name="no_placeholders",
success_message="No placeholder text",
failure_message="Response contains placeholder text",
)
)

Subclass Check when you need configurable parameters, reuse across scenarios, or a clean import path.

from giskard.checks import Check, CheckResult, Trace
from pydantic import Field
@Check.register("contains_keyword")
class ContainsKeyword(Check):
keyword: str = Field(
..., description="Keyword that must appear in the output"
)
case_sensitive: bool = Field(default=False)
async def run(self, trace: Trace) -> CheckResult:
output = trace.last.outputs
target = output if self.case_sensitive else output.lower()
needle = self.keyword if self.case_sensitive else self.keyword.lower()
passed = needle in target
if passed:
return CheckResult.success(message=f"Found '{self.keyword}'")
return CheckResult.failure(message=f"Missing '{self.keyword}'")

Instantiate it like any built-in check:

scenario = scenario.check(ContainsKeyword(name="mentions_price", keyword="price"))

@Check.register("contains_keyword") is optional but recommended. It registers the class under a stable string key that is used when serializing and deserializing scenarios and test suites. Without it, serialization falls back to the fully-qualified class name, which breaks if you rename or move the class.

Reading values from the trace with resolve

Section titled “Reading values from the trace with resolve”

Use resolve(trace, key) to extract values from the trace using dot-notation paths — the same paths used by Equals, Groundedness, and other built-ins.

from giskard.checks import Check, CheckResult, Trace
from giskard.checks.core.extraction import resolve
from pydantic import Field
class MaxTokens(Check):
key: str = Field(default="trace.last.outputs")
limit: int = Field(default=500)
async def run(self, trace: Trace) -> CheckResult:
value = resolve(trace, self.key)
token_count = len(str(value).split())
passed = token_count <= self.limit
msg = f"{token_count} tokens ({'ok' if passed else f'exceeds limit of {self.limit}'})"
if passed:
return CheckResult.success(message=msg)
return CheckResult.failure(message=msg)

BaseLLMCheck handles generator setup and prompt rendering. Override get_prompt and let the base class call the LLM and parse the passed: true/false response.

from giskard.checks import BaseLLMCheck
from pydantic import Field
class ToneCheck(BaseLLMCheck):
tone: str = Field(
..., description="Expected tone, e.g. 'professional', 'empathetic'"
)
def get_prompt(self) -> str:
return f"""
Evaluate whether the following response has a {self.tone} tone.
Response: {{{{ trace.last.outputs }}}}
Return 'passed: true' if the tone is {self.tone}, 'passed: false' otherwise.
Include a brief explanation.
"""

Use it like any other check:

scenario = scenario.check(ToneCheck(name="professional_tone", tone="professional"))

By default BaseLLMCheck expects the LLM to return a JSON object with the shape {"reason": str | None, "passed": bool}. You can change this by overriding output_type (a Pydantic model) and _handle_output. See the BaseLLMCheck API reference for details.

All Check.run() methods are async, so you can call external services without blocking the event loop.

import httpx
from giskard.checks import Check, CheckResult, Trace
class ToxicityAPICheck(Check):
api_url: str
async def run(self, trace: Trace) -> CheckResult:
async with httpx.AsyncClient() as client:
response = await client.post(
self.api_url,
json={"text": trace.last.outputs},
)
score = response.json()["toxicity_score"]
passed = score < 0.5
if passed:
return CheckResult.success(message=f"Toxicity score: {score:.2f}")
return CheckResult.failure(message=f"Toxicity score: {score:.2f}")

Group related checks into a helper function that returns a list, then pass them to .check() with the variadic form. Checks run sequentially — the scenario stops at the first failure, so order matters. Put cheap, fast checks before expensive LLM-based judges.

from giskard.checks import FnCheck
def safety_checks():
return [
FnCheck(
fn=lambda trace: len(trace.last.outputs) > 0,
name="non_empty",
success_message="Response is non-empty",
failure_message="Empty response",
),
FnCheck(
fn=lambda trace: "error" not in trace.last.outputs.lower(),
name="no_error_string",
success_message="No error string",
failure_message="Response contains 'error'",
),
ContainsKeyword(name="has_disclaimer", keyword="disclaimer"),
]
scenario = Scenario("safe_reply").interact(
inputs="Tell me about investing.",
outputs=lambda inputs: my_llm(inputs),
)
for chk in safety_checks():
scenario.check(chk)

Test the check logic in isolation before wiring it into a scenario.

import asyncio
from giskard.checks import Trace, Interaction
async def test_contains_keyword():
trace = Trace(
interactions=[
Interaction(
inputs="What is the price?", outputs="The price is $99."
)
]
)
check = ContainsKeyword(name="mentions_price", keyword="price")
result = await check.run(trace)
print(f"Check result: {result.message}")
assert result.passed
assert "price" in result.message.lower()
asyncio.run(test_contains_keyword())

Output

Check result: Found ‘price’