Skip to content
GitHubDiscord

Testing Utilities

Testing utilities, test runners, and debugging helpers.


Module: giskard.checks.core.testcase

Bundle a trace with a set of checks to execute. Useful for testing against fixed interaction sequences or replaying recorded conversations.

name str | None

Optional label for the test case.

trace Trace Required

The trace containing interactions to test against.

checks Sequence[Check] Required

Sequence of checks to run against the trace.

.run() TestCaseResult

Execute all checks against the trace.

return_exception bool Default: False
If True, return results even when exceptions occur instead of raising.
.assert_passed() None

Run the test case and assert that all checks passed. Raises AssertionError with formatted failure messages if any check fails.

from giskard.checks import TestCase, Trace, Interaction, Equals
trace = Trace(
interactions=[
Interaction(inputs="Hello", outputs="Hi there!"),
Interaction(inputs="How are you?", outputs="I'm doing well!"),
]
)
test_case = TestCase(
name="greeting_test",
trace=trace,
checks=[
Equals(expected_value="Hi there!", key="trace.interactions[0].outputs"),
Equals(
expected_value="I'm doing well!",
key="trace.interactions[1].outputs",
),
],
)
result = await test_case.run()
import pytest
@pytest.mark.asyncio
async def test_greeting_response():
trace = Trace(
interactions=[
Interaction(inputs="Hello", outputs="Hi there! How can I help?")
]
)
test_case = TestCase(
name="greeting_politeness",
trace=trace,
checks=[
FnCheck(
fn=lambda t: "help" in t.last.outputs.lower(),
name="offers_help",
),
],
)
await test_case.assert_passed()

Module: giskard.checks.core.result

Result of test case execution with check outcomes.

status TestCaseStatus

Overall test case status (PASS/FAIL/ERROR).

results list[CheckResult]

Results from all checks.

passed bool

True if all checks passed.

failed bool

True if any check failed.

duration_ms int

Execution time in milliseconds.

result = await test_case.run()
if result.passed:
print("All checks passed!")
else:
for check_result in result.results:
if check_result.failed:
print(f"Failed: {check_result.message}")
# Assert (raises if failed)
result.assert_passed()

Module: giskard.checks.testing.runner

Low-level runner for executing test cases. Most users should use test_case.run() instead.

.run() TestCaseResult

Execute a test case’s checks against its trace.

test_case TestCase Required
The test case to execute.
return_exception bool Default: False
Return results on exceptions.
get_runner() TestCaseRunner

Get the default process-wide singleton runner instance.


Module: giskard.checks.testing.spy

Spy on function calls during interaction generation for debugging. Wraps an interaction spec and records all function calls made during generation.

interaction_generator InteractionSpec Required

The interaction spec to spy on.

target str Required

JSONPath to the value to spy on.

from giskard.checks import Scenario, Interact, WithSpy
interaction_spec = Interact(
inputs=lambda trace: f"Context: {trace.last.outputs if trace.last else 'None'}",
outputs=lambda inputs: my_model(inputs),
)
spied_spec = WithSpy(
interaction_generator=interaction_spec, target="trace.last.outputs"
)
result = await Scenario("debug_test").add_interaction(spied_spec).run()
# Access spy data from metadata
spy_data = result.final_trace.last.metadata.get("trace.last.outputs")
print(spy_data)

from giskard.checks import TestCase, Trace, Interaction, FnCheck
recorded = [
Interaction(inputs="What's the weather?", outputs="It's sunny and 75F."),
Interaction(
inputs="Should I bring an umbrella?", outputs="No, you won't need one."
),
]
trace = Trace(interactions=recorded)
test_case = TestCase(
name="weather_replay",
trace=trace,
checks=[
FnCheck(
fn=lambda t: "sunny" in t.interactions[0].outputs.lower(),
name="weather",
),
FnCheck(
fn=lambda t: "no" in t.interactions[1].outputs.lower(),
name="umbrella",
),
],
)
await test_case.assert_passed()
test_cases = [
TestCase(name="test1", trace=trace1, checks=checks1),
TestCase(name="test2", trace=trace2, checks=checks2),
TestCase(name="test3", trace=trace3, checks=checks3),
]
results = await asyncio.gather(*[tc.run() for tc in test_cases])
passed = sum(1 for r in results if r.passed)
print(f"Passed: {passed}/{len(results)}")
import pytest
test_data = [("Hello", "Hi there!"), ("Good morning", "Good morning!")]
@pytest.mark.asyncio
@pytest.mark.parametrize("greeting_input,expected_output", test_data)
async def test_greeting_variations(greeting_input, expected_output):
trace = Trace(
interactions=[
Interaction(inputs=greeting_input, outputs=expected_output)
]
)
test_case = TestCase(
name=f"greeting_{greeting_input}",
trace=trace,
checks=[
Equals(expected_value=expected_output, key="trace.last.outputs")
],
)
await test_case.assert_passed()