Scenarios
Multi-step workflow testing with scenario builders and runners.
Scenario
Section titled “Scenario”Module: giskard.checks.core.scenario
The recommended entry point for creating test scenarios. Chain .interact(), .check(), and related methods: each call updates and returns the same instance. Internally, the runner executes steps: each step runs one or more interactions against the shared trace, then runs that step’s checks. Call .run() to execute against the SUT. You can also use Scenario.extend(...) to assemble steps from existing specs and checks, or pass the scenario to Suite.append().
Scenario() → Scenario Create a new scenario.
name str Default: Unnamed Scenario Scenario(“my_name”). trace_type type[TraceType] | None Default: None multiple_runs int Default: 1 run() is called without multiple_runs=…. Must be ≥ 1. .interact() → self Add an interaction to the scenario. Returns self for chaining.
inputs value | Callable Required (trace) -> value. outputs value | Callable (inputs) -> value, or (trace, inputs) -> value. Optional when the scenario / suite has a target. metadata dict | None .check() → self Add a validation check to the scenario. Returns self for chaining.
.add_interaction() → self Add a pre-constructed InteractionSpec object.
interaction InteractionSpec Required .extend() → self Append one or more interaction specs and/or checks. Returns self for chaining.
.run() → ScenarioResult Execute the scenario against the SUT and return results.
target Callable | None Default: None return_exception bool Default: False multiple_runs int | None Default: None multiple_runs field: maximum full scenario executions (fresh trace each time). Each run must pass for the next to run; stops on the first FAIL, ERROR, or SKIP. Not a retry-until-success loop. Multi-step example
Section titled “Multi-step example”from giskard.checks import Scenario, FnCheck, Equals
result = await ( Scenario("customer_support") .interact( inputs="I need help with my account", outputs="I'd be happy to help! What's your account number?", ) .check( FnCheck( fn=lambda trace: "help" in trace.last.outputs.lower(), name="helpful", ) ) .interact(inputs="12345", outputs="Thank you! I've found your account.") .check(Equals(expected_value=True, key="trace.last.metadata.account_found")) .run())Dynamic interactions
Section titled “Dynamic interactions”# Use callables for dynamic generationdef generate_response(inputs): if "weather" in inputs: return "It's sunny today!" return "I don't understand."
scenario = Scenario("dynamic_test").interact( inputs="What's the weather?", outputs=generate_response)Context-aware interactions
Section titled “Context-aware interactions”scenario = ( Scenario("context_test") .interact(inputs="Hello", outputs="Hi! I'm Alice.") .interact( inputs=lambda trace: f"Nice to meet you, {trace.last.outputs.split()[-1][:-1]}!", outputs="Nice to meet you too!", ))ScenarioResult
Section titled “ScenarioResult”Module: giskard.checks.core.result
Result of scenario execution with trace and check results.
ScenarioResult scenario_name str Name of the scenario that produced this result.
status ScenarioStatus Overall status (PASS/FAIL/ERROR/SKIP).
steps list[TestCaseResult] Results for each step (interactions in that step, then checks).
final_trace Trace Complete trace of all interactions.
passed bool True when the aggregated status is PASS (no failures or errors; not the all-skipped case).
failed bool True when at least one step failed and none errored.
errored bool True when at least one step errored.
skipped bool True when all steps were skipped.
duration_ms int Total execution time in milliseconds.
multiple_runs int Configured cap on full scenario executions for this invocation (from the
scenario or the run(multiple_runs=...) override).
runs_executed int How many full scenario executions ran before stopping (at most
multiple_runs).
result = await test_scenario.run()
if result.passed: print("All checks passed!")
print(f"Total interactions: {len(result.final_trace.interactions)}")
for i, check_result in enumerate( r for step in result.steps for r in step.results): print(f"Check {i}: {check_result.status}")Module: giskard.checks.scenarios.suite
Group multiple scenarios and run them together.
Suite() → Suite name str Required Suite identifier.
target Callable Optional suite-level target SUT.
.append() → Suite Add a scenario to the suite.
scenario Scenario Required .run() → SuiteResult Run all scenarios serially.
target Callable return_exception bool Default: False from giskard.checks import Suite, Scenario
suite = Suite(name="my_suite", target=my_sut)suite.append(scenario1)suite.append(scenario2)result = await suite.run()print(result.pass_rate)SuiteResult
Section titled “SuiteResult”Module: giskard.checks.core.result
Aggregate result from suite execution.
SuiteResult results list[ScenarioResult] Scenario results in order.
pass_rate float Fraction of non-skipped scenarios that passed. If every scenario was
skipped, this is 1.0.
duration_ms int Total execution time in milliseconds.
passed_count int Number of passed scenarios.
failed_count int Number of failed scenarios.
errored_count int Number of scenarios that errored.
skipped_count int Number of scenarios that were skipped.
.to_junit_xml() → str Export the suite result as a JUnit XML string. Optionally write to a file.
path str | Path | None Default: None ScenarioRunner
Section titled “ScenarioRunner”Module: giskard.checks.scenarios.runner
Low-level runner for executing scenarios. Most users should use Scenario(...).run() instead.
.run() → ScenarioResult scenario Scenario Required The scenario to execute.
target Callable | None Default: None Override the scenario’s target SUT.
return_exception bool Default: False Return results on exceptions.
multiple_runs int | None Default: None Optional override of the scenario’s multiple_runs (same semantics as
Scenario.run(multiple_runs=...)).
get_runner() → ScenarioRunner Get the default process-wide singleton runner instance.
See also
Section titled “See also”- Core API — Scenario, Trace, and Interaction details
- Built-in Checks — Checks to use in scenarios
- Testing Utilities — Test runners and utilities