Output
Check result: Found ‘price’
Build domain-specific checks that go beyond the built-in library — from simple predicate functions to stateful LLM judges.
FnCheckFnCheck wraps any boolean function into a named check. Use it when the logic
fits in one expression.
from giskard.checks import FnCheck, Scenario
is_short = FnCheck( fn=lambda trace: len(trace.last.outputs) < 200, name="response_is_concise", success_message="Response is concise", failure_message="Response is too long",)
scenario = ( Scenario("concise_reply") .interact(inputs="Summarize in one sentence.", outputs=lambda inputs: my_llm(inputs)) .check(is_short))For anything more complex, define a named function:
def no_placeholder_text(trace) -> bool: output = trace.last.outputs return "[INSERT" not in output and "TODO" not in output
scenario = scenario.check( FnCheck( fn=no_placeholder_text, name="no_placeholders", success_message="No placeholder text", failure_message="Response contains placeholder text", ))Subclass Check when you need configurable parameters, reuse across scenarios,
or a clean import path.
from giskard.checks import Check, CheckResult, Tracefrom pydantic import Field
@Check.register("contains_keyword")class ContainsKeyword(Check): keyword: str = Field( ..., description="Keyword that must appear in the output" ) case_sensitive: bool = Field(default=False)
async def run(self, trace: Trace) -> CheckResult: output = trace.last.outputs target = output if self.case_sensitive else output.lower() needle = self.keyword if self.case_sensitive else self.keyword.lower() passed = needle in target if passed: return CheckResult.success(message=f"Found '{self.keyword}'") return CheckResult.failure(message=f"Missing '{self.keyword}'")Instantiate it like any built-in check:
scenario = scenario.check(ContainsKeyword(name="mentions_price", keyword="price"))@Check.register("contains_keyword") is optional but recommended. It registers the class under a stable string key that is used when serializing and deserializing scenarios and test suites. Without it, serialization falls back to the fully-qualified class name, which breaks if you rename or move the class.
resolveUse resolve(trace, key) to extract values from the trace using dot-notation
paths — the same paths used by Equals, Groundedness, and other built-ins.
from giskard.checks import Check, CheckResult, Tracefrom giskard.checks.core.extraction import resolvefrom pydantic import Field
class MaxTokens(Check): key: str = Field(default="trace.last.outputs") limit: int = Field(default=500)
async def run(self, trace: Trace) -> CheckResult: value = resolve(trace, self.key) token_count = len(str(value).split()) passed = token_count <= self.limit msg = f"{token_count} tokens ({'ok' if passed else f'exceeds limit of {self.limit}'})" if passed: return CheckResult.success(message=msg) return CheckResult.failure(message=msg)BaseLLMCheckBaseLLMCheck handles generator setup and prompt rendering. Override
get_prompt and let the base class call the LLM and parse the
passed: true/false response.
from giskard.checks import BaseLLMCheckfrom pydantic import Field
class ToneCheck(BaseLLMCheck): tone: str = Field( ..., description="Expected tone, e.g. 'professional', 'empathetic'" )
def get_prompt(self) -> str: return f""" Evaluate whether the following response has a {self.tone} tone.
Response: {{{{ trace.last.outputs }}}}
Return 'passed: true' if the tone is {self.tone}, 'passed: false' otherwise. Include a brief explanation. """Use it like any other check:
scenario = scenario.check(ToneCheck(name="professional_tone", tone="professional"))By default BaseLLMCheck expects the LLM to return a JSON object with the shape {"reason": str | None, "passed": bool}. You can change this by overriding output_type (a Pydantic model) and _handle_output. See the BaseLLMCheck API reference for details.
All Check.run() methods are async, so you can call external services without
blocking the event loop.
import httpxfrom giskard.checks import Check, CheckResult, Trace
class ToxicityAPICheck(Check): api_url: str
async def run(self, trace: Trace) -> CheckResult: async with httpx.AsyncClient() as client: response = await client.post( self.api_url, json={"text": trace.last.outputs}, ) score = response.json()["toxicity_score"] passed = score < 0.5 if passed: return CheckResult.success(message=f"Toxicity score: {score:.2f}") return CheckResult.failure(message=f"Toxicity score: {score:.2f}")Group related checks into a helper function that returns a list, then pass
them to .check() with the variadic form. Checks run sequentially — the
scenario stops at the first failure, so order matters. Put cheap, fast checks
before expensive LLM-based judges.
from giskard.checks import FnCheck
def safety_checks(): return [ FnCheck( fn=lambda trace: len(trace.last.outputs) > 0, name="non_empty", success_message="Response is non-empty", failure_message="Empty response", ), FnCheck( fn=lambda trace: "error" not in trace.last.outputs.lower(), name="no_error_string", success_message="No error string", failure_message="Response contains 'error'", ), ContainsKeyword(name="has_disclaimer", keyword="disclaimer"), ]
scenario = Scenario("safe_reply").interact( inputs="Tell me about investing.", outputs=lambda inputs: my_llm(inputs),)for chk in safety_checks(): scenario.check(chk)Test the check logic in isolation before wiring it into a scenario.
import asynciofrom giskard.checks import Trace, Interaction
async def test_contains_keyword(): trace = Trace( interactions=[ Interaction( inputs="What is the price?", outputs="The price is $99." ) ] ) check = ContainsKeyword(name="mentions_price", keyword="price") result = await check.run(trace) print(f"Check result: {result.message}") assert result.passed assert "price" in result.message.lower()
asyncio.run(test_contains_keyword())Output
Check result: Found ‘price’