When to use which check
Three check families cover most use cases. Pick the simplest one that can express your requirement.
Tradeoffs at a Glance
Section titled “Tradeoffs at a Glance”| Rule-based | Semantic similarity | LLM-as-judge | |
|---|---|---|---|
| Examples | Equals, StringMatching, FnCheck | SemanticSimilarity | Groundedness, Conformity, LLMJudge |
| Cost | Free | Low (embedding call) | Medium–High (LLM call) |
| Latency | <1 ms | ~50–200 ms | ~1–10 s |
| Deterministic | Yes | Near-deterministic | No |
| Best for | Exact values, keywords, formats | Meaning-equivalent answers | Tone, reasoning, policy compliance |
Choosing the Right Check
Section titled “Choosing the Right Check”Rule-based — when you can express the pass condition as a predicate: required keywords, value ranges, exact labels. Use these first; they’re free, instant, and never flaky.
Equals(expected_value="potential_fraud", key="trace.last.outputs.label")StringMatching( keyword="Pre-authorization", text_key="trace.last.outputs.answer")LesserThan(expected_value=500, key="trace.last.outputs.token_count")Semantic similarity — when phrasing varies but meaning should be consistent. Cheaper and faster than an LLM judge.
SemanticSimilarity( reference_text="The capital of France is Paris.", actual_answer_key="trace.last.outputs", threshold=0.85,)LLM-as-judge — when the criterion is qualitative and hard to express as a rule: tone, groundedness, policy compliance, reasoning quality.
Groundedness( answer_key="trace.last.outputs.answer", context_key="trace.last.outputs.context",)Conformity(rule="Response must not give medical advice")Combining Check Types
Section titled “Combining Check Types”Layer all three in a single scenario: run the cheap deterministic checks first, and only reach for LLM judges when you genuinely need them.
from giskard.checks import Scenario, StringMatching, GreaterThan, Groundedness
question = "What is the refund policy?"
def rag_system(query: str) -> dict: # Your RAG system return { "answer": "Refunds are processed within 5 business days.", "context": "Policy §3.2", "confidence": 0.9, }
tc = ( Scenario("rag_test") .interact(inputs=question, outputs=lambda q: rag_system(q)) # Fast, free .check( GreaterThan( name="has_confidence", key="trace.last.outputs.confidence", expected_value=0.5, ) ) .check( StringMatching( name="cites_policy", keyword="policy", text_key="trace.last.outputs.answer", ) ) # Slower, costs a few cents .check( Groundedness( name="grounded", answer_key="trace.last.outputs.answer", context_key="trace.last.outputs.context", ) ))