Testing Utilities
Testing utilities, test runners, and debugging helpers.
TestCase
Section titled “TestCase”Module: giskard.checks.core.testcase
Bundle a trace with a set of checks to execute. Useful for testing against fixed interaction sequences or replaying recorded conversations.
name str | None Optional label for the test case.
trace Trace Required The trace containing interactions to test against.
checks Sequence[Check] Required Sequence of checks to run against the trace.
.run() → TestCaseResult Execute all checks against the trace.
return_exception bool Default: False .assert_passed() → None Run the test case and assert that all checks passed. Raises AssertionError
with formatted failure messages if any check fails.
from giskard.checks import TestCase, Trace, Interaction, Equals
trace = Trace( interactions=[ Interaction(inputs="Hello", outputs="Hi there!"), Interaction(inputs="How are you?", outputs="I'm doing well!"), ])
test_case = TestCase( name="greeting_test", trace=trace, checks=[ Equals(expected_value="Hi there!", key="trace.interactions[0].outputs"), Equals( expected_value="I'm doing well!", key="trace.interactions[1].outputs", ), ],)
result = await test_case.run()Integration with pytest
Section titled “Integration with pytest”import pytest
@pytest.mark.asyncioasync def test_greeting_response(): trace = Trace( interactions=[ Interaction(inputs="Hello", outputs="Hi there! How can I help?") ] ) test_case = TestCase( name="greeting_politeness", trace=trace, checks=[ FnCheck( fn=lambda t: "help" in t.last.outputs.lower(), name="offers_help", ), ], ) await test_case.assert_passed()TestCaseResult
Section titled “TestCaseResult”Module: giskard.checks.core.result
Result of test case execution with check outcomes.
status TestCaseStatus Overall test case status (PASS/FAIL/ERROR).
results list[CheckResult] Results from all checks.
passed bool True if all checks passed.
failed bool True if any check failed.
duration_ms int Execution time in milliseconds.
result = await test_case.run()
if result.passed: print("All checks passed!")else: for check_result in result.results: if check_result.failed: print(f"Failed: {check_result.message}")
# Assert (raises if failed)result.assert_passed()TestCaseRunner
Section titled “TestCaseRunner”Module: giskard.checks.testing.runner
Low-level runner for executing test cases. Most users should use test_case.run() instead.
.run() → TestCaseResult Execute a test case’s checks against its trace.
test_case TestCase Required return_exception bool Default: False get_runner() → TestCaseRunner Get the default process-wide singleton runner instance.
WithSpy
Section titled “WithSpy”Module: giskard.checks.testing.spy
Spy on function calls during interaction generation for debugging. Wraps an interaction spec and records all function calls made during generation.
interaction_generator InteractionSpec Required The interaction spec to spy on.
target str Required JSONPath to the value to spy on.
from giskard.checks import Scenario, Interact, WithSpy
interaction_spec = Interact( inputs=lambda trace: f"Context: {trace.last.outputs if trace.last else 'None'}", outputs=lambda inputs: my_model(inputs),)
spied_spec = WithSpy( interaction_generator=interaction_spec, target="trace.last.outputs")
result = await Scenario("debug_test").add_interaction(spied_spec).run()
# Access spy data from metadataspy_data = result.final_trace.last.metadata.get("trace.last.outputs")print(spy_data)Usage patterns
Section titled “Usage patterns”Replaying recorded conversations
Section titled “Replaying recorded conversations”from giskard.checks import TestCase, Trace, Interaction, FnCheck
recorded = [ Interaction(inputs="What's the weather?", outputs="It's sunny and 75F."), Interaction( inputs="Should I bring an umbrella?", outputs="No, you won't need one." ),]
trace = Trace(interactions=recorded)
test_case = TestCase( name="weather_replay", trace=trace, checks=[ FnCheck( fn=lambda t: "sunny" in t.interactions[0].outputs.lower(), name="weather", ), FnCheck( fn=lambda t: "no" in t.interactions[1].outputs.lower(), name="umbrella", ), ],)
await test_case.assert_passed()Batch testing
Section titled “Batch testing”test_cases = [ TestCase(name="test1", trace=trace1, checks=checks1), TestCase(name="test2", trace=trace2, checks=checks2), TestCase(name="test3", trace=trace3, checks=checks3),]
results = await asyncio.gather(*[tc.run() for tc in test_cases])
passed = sum(1 for r in results if r.passed)print(f"Passed: {passed}/{len(results)}")Parameterized tests
Section titled “Parameterized tests”import pytest
test_data = [("Hello", "Hi there!"), ("Good morning", "Good morning!")]
@pytest.mark.asyncio@pytest.mark.parametrize("greeting_input,expected_output", test_data)async def test_greeting_variations(greeting_input, expected_output): trace = Trace( interactions=[ Interaction(inputs=greeting_input, outputs=expected_output) ] ) test_case = TestCase( name=f"greeting_{greeting_input}", trace=trace, checks=[ Equals(expected_value=expected_output, key="trace.last.outputs") ], ) await test_case.assert_passed()See also
Section titled “See also”- Core API — Trace, Interaction, and Check details
- Scenarios — Building multi-step test workflows
- Built-in Checks — Ready-to-use validation checks