Testing Utilities

Testing utilities, test runners, and debugging helpers.

`TestCase`

Module: giskard.checks.core.testcase

Bundle a trace with a set of checks to execute. Useful for testing against fixed interaction sequences or replaying recorded conversations.

TestCase

name str | None

Optional label for the test case.

trace Trace Required

The trace containing interactions to test against.

checks Sequence[Check] Required

Sequence of checks to run against the trace.

.run() → TestCaseResult

Execute all checks against the trace.

return_exception bool Default: False

If True, return results even when exceptions occur instead of raising.

.assert_passed() → None

Run the test case and assert that all checks passed. Raises AssertionError with formatted failure messages if any check fails.

from giskard.checks import TestCase, Trace, Interaction, Equals

trace = Trace(
    interactions=[
        Interaction(inputs="Hello", outputs="Hi there!"),
        Interaction(inputs="How are you?", outputs="I'm doing well!"),
    ]
)

test_case = TestCase(
    name="greeting_test",
    trace=trace,
    checks=[
        Equals(expected_value="Hi there!", key="trace.interactions[0].outputs"),
        Equals(
            expected_value="I'm doing well!",
            key="trace.interactions[1].outputs",
        ),
    ],
)

result = await test_case.run()

Integration with pytest

import pytest


@pytest.mark.asyncio
async def test_greeting_response():
    trace = Trace(
        interactions=[
            Interaction(inputs="Hello", outputs="Hi there! How can I help?")
        ]
    )
    test_case = TestCase(
        name="greeting_politeness",
        trace=trace,
        checks=[
            FnCheck(
                fn=lambda t: "help" in t.last.outputs.lower(),
                name="offers_help",
            ),
        ],
    )
    await test_case.assert_passed()

`TestCaseResult`

Module: giskard.checks.core.result

Result of test case execution with check outcomes.

TestCaseResult

status TestCaseStatus

Overall test case status (PASS/FAIL/ERROR/SKIP).

results list[CheckResult]

Results from all checks.

passed bool

True if all checks passed.

failed bool

True if any check failed.

errored bool

True if any check errored.

skipped bool

True if all checks were skipped.

duration_ms int

Execution time in milliseconds.

.assert_passed() → None

Raise AssertionError with formatted failure messages if any check did not pass. Useful in pytest.

result = await test_case.run()

if result.passed:
    print("All checks passed!")
else:
    for check_result in result.results:
        if check_result.failed:
            print(f"Failed: {check_result.message}")

# Assert (raises if failed)
result.assert_passed()

`TestCaseRunner`

Module: giskard.checks.testing.runner

Low-level runner for executing test cases. Most users should use test_case.run() instead.

.run() → TestCaseResult

Execute a test case’s checks against its trace.

test_case TestCase Required

The test case to execute.

return_exception bool Default: False

Return results on exceptions.

get_runner() → TestCaseRunner

Get the default process-wide singleton runner instance.

`WithSpy`

Module: giskard.checks.testing.spy

Spy on function calls during interaction generation for debugging. Wraps an interaction spec and records all function calls made during generation.

WithSpy

interaction_generator InteractionSpec Required

The interaction spec to spy on.

target str Required

JSONPath to the value to spy on.

from giskard.checks import Scenario, Interact, WithSpy

interaction_spec = Interact(
    inputs=lambda trace: f"Context: {trace.last.outputs if trace.last else 'None'}",
    outputs=lambda inputs: my_model(inputs),
)

spied_spec = WithSpy(
    interaction_generator=interaction_spec, target="trace.last.outputs"
)

result = await Scenario("debug_test").add_interaction(spied_spec).run()

# Access spy data from metadata
spy_data = result.final_trace.last.metadata.get("trace.last.outputs")
print(spy_data)

Usage patterns

Replaying recorded conversations

from giskard.checks import TestCase, Trace, Interaction, FnCheck

recorded = [
    Interaction(inputs="What's the weather?", outputs="It's sunny and 75F."),
    Interaction(
        inputs="Should I bring an umbrella?", outputs="No, you won't need one."
    ),
]

trace = Trace(interactions=recorded)

test_case = TestCase(
    name="weather_replay",
    trace=trace,
    checks=[
        FnCheck(
            fn=lambda t: "sunny" in t.interactions[0].outputs.lower(),
            name="weather",
        ),
        FnCheck(
            fn=lambda t: "no" in t.interactions[1].outputs.lower(),
            name="umbrella",
        ),
    ],
)

await test_case.assert_passed()

Batch testing

test_cases = [
    TestCase(name="test1", trace=trace1, checks=checks1),
    TestCase(name="test2", trace=trace2, checks=checks2),
    TestCase(name="test3", trace=trace3, checks=checks3),
]

results = await asyncio.gather(*[tc.run() for tc in test_cases])

passed = sum(1 for r in results if r.passed)
print(f"Passed: {passed}/{len(results)}")

Parameterized tests

import pytest

test_data = [("Hello", "Hi there!"), ("Good morning", "Good morning!")]


@pytest.mark.asyncio
@pytest.mark.parametrize("greeting_input,expected_output", test_data)
async def test_greeting_variations(greeting_input, expected_output):
    trace = Trace(
        interactions=[
            Interaction(inputs=greeting_input, outputs=expected_output)
        ]
    )
    test_case = TestCase(
        name=f"greeting_{greeting_input}",
        trace=trace,
        checks=[
            Equals(expected_value=expected_output, key="trace.last.outputs")
        ],
    )
    await test_case.assert_passed()