Skip to content
GitHubDiscord

Scenarios

Multi-step workflow testing with scenario builders and runners.


Module: giskard.checks.core.scenario

The recommended entry point for creating test scenarios. Chain .interact(), .check(), and related methods: each call updates and returns the same instance. Internally, the runner executes steps: each step runs one or more interactions against the shared trace, then runs that step’s checks. Call .run() to execute against the SUT. You can also use Scenario.extend(...) to assemble steps from existing specs and checks, or pass the scenario to Suite.append().

Scenario() Scenario

Create a new scenario.

name str Default: Unnamed Scenario
Scenario name for identification. Pass the first argument positionally as in Scenario(“my_name”).
trace_type type[TraceType] | None Default: None
Optional custom trace type for advanced use cases.
multiple_runs int Default: 1
Default cap on full scenario executions when run() is called without multiple_runs=…. Must be ≥ 1.
.interact() self

Add an interaction to the scenario. Returns self for chaining.

inputs value | Callable Required
Static value or callable (trace) -> value.
outputs value | Callable
Static value, callable (inputs) -> value, or (trace, inputs) -> value. Optional when the scenario / suite has a target.
metadata dict | None
Optional metadata dictionary.
.check() self

Add a validation check to the scenario. Returns self for chaining.

check Check Required
A Check instance to validate the trace.
.add_interaction() self

Add a pre-constructed InteractionSpec object.

interaction InteractionSpec Required
The interaction spec to add.
.extend() self

Append one or more interaction specs and/or checks. Returns self for chaining.

*components InteractionSpec | Check Required
Components to append in order.
.run() ScenarioResult

Execute the scenario against the SUT and return results.

target Callable | None Default: None
Override the scenario’s default target system-under-test for this run.
return_exception bool Default: False
If True, return results even when exceptions occur instead of raising.
multiple_runs int | None Default: None
When set, overrides the scenario’s multiple_runs field: maximum full scenario executions (fresh trace each time). Each run must pass for the next to run; stops on the first FAIL, ERROR, or SKIP. Not a retry-until-success loop.
from giskard.checks import Scenario, FnCheck, Equals
result = await (
Scenario("customer_support")
.interact(
inputs="I need help with my account",
outputs="I'd be happy to help! What's your account number?",
)
.check(
FnCheck(
fn=lambda trace: "help" in trace.last.outputs.lower(),
name="helpful",
)
)
.interact(inputs="12345", outputs="Thank you! I've found your account.")
.check(Equals(expected_value=True, key="trace.last.metadata.account_found"))
.run()
)
# Use callables for dynamic generation
def generate_response(inputs):
if "weather" in inputs:
return "It's sunny today!"
return "I don't understand."
scenario = Scenario("dynamic_test").interact(
inputs="What's the weather?", outputs=generate_response
)
scenario = (
Scenario("context_test")
.interact(inputs="Hello", outputs="Hi! I'm Alice.")
.interact(
inputs=lambda trace: f"Nice to meet you, {trace.last.outputs.split()[-1][:-1]}!",
outputs="Nice to meet you too!",
)
)

Module: giskard.checks.core.result

Result of scenario execution with trace and check results.

ScenarioResult
scenario_name str

Name of the scenario that produced this result.

status ScenarioStatus

Overall status (PASS/FAIL/ERROR/SKIP).

steps list[TestCaseResult]

Results for each step (interactions in that step, then checks).

final_trace Trace

Complete trace of all interactions.

passed bool

True when the aggregated status is PASS (no failures or errors; not the all-skipped case).

failed bool

True when at least one step failed and none errored.

errored bool

True when at least one step errored.

skipped bool

True when all steps were skipped.

duration_ms int

Total execution time in milliseconds.

multiple_runs int

Configured cap on full scenario executions for this invocation (from the scenario or the run(multiple_runs=...) override).

runs_executed int

How many full scenario executions ran before stopping (at most multiple_runs).

result = await test_scenario.run()
if result.passed:
print("All checks passed!")
print(f"Total interactions: {len(result.final_trace.interactions)}")
for i, check_result in enumerate(
r for step in result.steps for r in step.results
):
print(f"Check {i}: {check_result.status}")

Module: giskard.checks.scenarios.suite

Group multiple scenarios and run them together.

Suite() Suite
name str Required

Suite identifier.

target Callable

Optional suite-level target SUT.

.append() Suite

Add a scenario to the suite.

scenario Scenario Required
The scenario to add.
.run() SuiteResult

Run all scenarios serially.

target Callable
Override target for this run.
return_exception bool Default: False
Return results on exceptions.
from giskard.checks import Suite, Scenario
suite = Suite(name="my_suite", target=my_sut)
suite.append(scenario1)
suite.append(scenario2)
result = await suite.run()
print(result.pass_rate)

Module: giskard.checks.core.result

Aggregate result from suite execution.

SuiteResult
results list[ScenarioResult]

Scenario results in order.

pass_rate float

Fraction of non-skipped scenarios that passed. If every scenario was skipped, this is 1.0.

duration_ms int

Total execution time in milliseconds.

passed_count int

Number of passed scenarios.

failed_count int

Number of failed scenarios.

errored_count int

Number of scenarios that errored.

skipped_count int

Number of scenarios that were skipped.

.to_junit_xml() str

Export the suite result as a JUnit XML string. Optionally write to a file.

path str | Path | None Default: None
File path to write the XML to. Returns the XML string regardless.

Module: giskard.checks.scenarios.runner

Low-level runner for executing scenarios. Most users should use Scenario(...).run() instead.

.run() ScenarioResult
scenario Scenario Required

The scenario to execute.

target Callable | None Default: None

Override the scenario’s target SUT.

return_exception bool Default: False

Return results on exceptions.

multiple_runs int | None Default: None

Optional override of the scenario’s multiple_runs (same semantics as Scenario.run(multiple_runs=...)).

get_runner() ScenarioRunner

Get the default process-wide singleton runner instance.