Testing Utilities
Testing utilities, test runners, and debugging helpers.
TestCase
Section titled “TestCase”Module: giskard.checks.core.testcase
Bundle a trace with a set of checks to execute. Useful for testing against fixed interaction sequences or replaying recorded conversations.
TestCase .run() → TestCaseResult Execute all checks against the trace.
return_exception bool Default: False .assert_passed() → None Run the test case and assert that all checks passed. Raises AssertionError
with formatted failure messages if any check fails.
from giskard.checks import TestCase, Trace, Interaction, Equals
trace = Trace( interactions=[ Interaction(inputs="Hello", outputs="Hi there!"), Interaction(inputs="How are you?", outputs="I'm doing well!"), ])
test_case = TestCase( name="greeting_test", trace=trace, checks=[ Equals(expected_value="Hi there!", key="trace.interactions[0].outputs"), Equals( expected_value="I'm doing well!", key="trace.interactions[1].outputs", ), ],)
result = await test_case.run()Integration with pytest
Section titled “Integration with pytest”import pytest
@pytest.mark.asyncioasync def test_greeting_response(): trace = Trace( interactions=[ Interaction(inputs="Hello", outputs="Hi there! How can I help?") ] ) test_case = TestCase( name="greeting_politeness", trace=trace, checks=[ FnCheck( fn=lambda t: "help" in t.last.outputs.lower(), name="offers_help", ), ], ) await test_case.assert_passed()TestCaseResult
Section titled “TestCaseResult”Module: giskard.checks.core.result
Result of test case execution with check outcomes.
TestCaseResult status TestCaseStatus Overall test case status (PASS/FAIL/ERROR/SKIP).
results list[CheckResult] Results from all checks.
passed bool True if all checks passed.
failed bool True if any check failed.
errored bool True if any check errored.
skipped bool True if all checks were skipped.
duration_ms int Execution time in milliseconds.
.assert_passed() → None Raise AssertionError with formatted failure messages if any check did not
pass. Useful in pytest.
result = await test_case.run()
if result.passed: print("All checks passed!")else: for check_result in result.results: if check_result.failed: print(f"Failed: {check_result.message}")
# Assert (raises if failed)result.assert_passed()TestCaseRunner
Section titled “TestCaseRunner”Module: giskard.checks.testing.runner
Low-level runner for executing test cases. Most users should use test_case.run() instead.
.run() → TestCaseResult Execute a test case’s checks against its trace.
return_exception bool Default: False get_runner() → TestCaseRunner Get the default process-wide singleton runner instance.
WithSpy
Section titled “WithSpy”Module: giskard.checks.testing.spy
Spy on function calls during interaction generation for debugging. Wraps an interaction spec and records all function calls made during generation.
WithSpy interaction_generator InteractionSpec Required The interaction spec to spy on.
target str Required JSONPath to the value to spy on.
from giskard.checks import Scenario, Interact, WithSpy
interaction_spec = Interact( inputs=lambda trace: f"Context: {trace.last.outputs if trace.last else 'None'}", outputs=lambda inputs: my_model(inputs),)
spied_spec = WithSpy( interaction_generator=interaction_spec, target="trace.last.outputs")
result = await Scenario("debug_test").add_interaction(spied_spec).run()
# Access spy data from metadataspy_data = result.final_trace.last.metadata.get("trace.last.outputs")print(spy_data)Usage patterns
Section titled “Usage patterns”Replaying recorded conversations
Section titled “Replaying recorded conversations”from giskard.checks import TestCase, Trace, Interaction, FnCheck
recorded = [ Interaction(inputs="What's the weather?", outputs="It's sunny and 75F."), Interaction( inputs="Should I bring an umbrella?", outputs="No, you won't need one." ),]
trace = Trace(interactions=recorded)
test_case = TestCase( name="weather_replay", trace=trace, checks=[ FnCheck( fn=lambda t: "sunny" in t.interactions[0].outputs.lower(), name="weather", ), FnCheck( fn=lambda t: "no" in t.interactions[1].outputs.lower(), name="umbrella", ), ],)
await test_case.assert_passed()Batch testing
Section titled “Batch testing”test_cases = [ TestCase(name="test1", trace=trace1, checks=checks1), TestCase(name="test2", trace=trace2, checks=checks2), TestCase(name="test3", trace=trace3, checks=checks3),]
results = await asyncio.gather(*[tc.run() for tc in test_cases])
passed = sum(1 for r in results if r.passed)print(f"Passed: {passed}/{len(results)}")Parameterized tests
Section titled “Parameterized tests”import pytest
test_data = [("Hello", "Hi there!"), ("Good morning", "Good morning!")]
@pytest.mark.asyncio@pytest.mark.parametrize("greeting_input,expected_output", test_data)async def test_greeting_variations(greeting_input, expected_output): trace = Trace( interactions=[ Interaction(inputs=greeting_input, outputs=expected_output) ] ) test_case = TestCase( name=f"greeting_{greeting_input}", trace=trace, checks=[ Equals(expected_value=expected_output, key="trace.last.outputs") ], ) await test_case.assert_passed()See also
Section titled “See also”- Core API — Trace, Interaction, and Check details
- Scenarios — Building multi-step test workflows
- Built-in Checks — Ready-to-use validation checks