Skip to content
GitHubDiscord

When to use which check

Three check families cover most use cases. Pick the simplest one that can express your requirement.

Rule-basedSemantic similarityLLM-as-judge
ExamplesEquals, StringMatching, FnCheckSemanticSimilarityGroundedness, Conformity, LLMJudge
CostFreeLow (embedding call)Medium–High (LLM call)
Latency<1 ms~50–200 ms~1–10 s
DeterministicYesNear-deterministicNo
Best forExact values, keywords, formatsMeaning-equivalent answersTone, reasoning, policy compliance

Rule-based — when you can express the pass condition as a predicate: required keywords, value ranges, exact labels. Use these first; they’re free, instant, and never flaky.

Equals(expected_value="potential_fraud", key="trace.last.outputs.label")
StringMatching(
keyword="Pre-authorization", text_key="trace.last.outputs.answer"
)
LesserThan(expected_value=500, key="trace.last.outputs.token_count")

Semantic similarity — when phrasing varies but meaning should be consistent. Cheaper and faster than an LLM judge.

SemanticSimilarity(
reference_text="The capital of France is Paris.",
actual_answer_key="trace.last.outputs",
threshold=0.85,
)

LLM-as-judge — when the criterion is qualitative and hard to express as a rule: tone, groundedness, policy compliance, reasoning quality.

Groundedness(
answer_key="trace.last.outputs.answer",
context_key="trace.last.outputs.context",
)
Conformity(rule="Response must not give medical advice")

Layer all three in a single scenario: run the cheap deterministic checks first, and only reach for LLM judges when you genuinely need them.

from giskard.checks import Scenario, StringMatching, GreaterThan, Groundedness
question = "What is the refund policy?"
def rag_system(query: str) -> dict:
# Your RAG system
return {
"answer": "Refunds are processed within 5 business days.",
"context": "Policy §3.2",
"confidence": 0.9,
}
tc = (
Scenario("rag_test")
.interact(inputs=question, outputs=lambda q: rag_system(q))
# Fast, free
.check(
GreaterThan(
name="has_confidence",
key="trace.last.outputs.confidence",
expected_value=0.5,
)
)
.check(
StringMatching(
name="cites_policy",
keyword="policy",
text_key="trace.last.outputs.answer",
)
)
# Slower, costs a few cents
.check(
Groundedness(
name="grounded",
answer_key="trace.last.outputs.answer",
context_key="trace.last.outputs.context",
)
)
)