Multi-turn scenarios test conversational flows, stateful interactions, and
complex workflows that span multiple exchanges. Use them to verify that your
system stays compliant, consistent, and safe across an entire conversation.
Many AI applications involve multiple interactions:
Agents that use tools across multiple steps
Chatbots that maintain conversation context
Conversational RAG where follow-up questions reference earlier context
The Scenario class executes multiple interaction specs and checks in sequence
with a shared trace. Because every interaction appends to the same trace, a
check at step 3 can inspect what was said at step 1 β making it possible to
assert on behaviour that spans the whole conversation.
The example below models a two-step incident intake: the first turn verifies
that a case ID is issued, and the second verifies that escalation is confirmed.
Each check fires immediately after its own turn so you know exactly which step
produced an unexpected result.
from giskard.checks import Scenario, StringMatching
test_scenario =(
Scenario("incident_intake")
# First interaction
.interact(
inputs="I think my account was compromised.",
outputs=lambdainputs:(
"Thanks. I have opened case ID SEC-1042. "
"Can you confirm the last transaction?"
),
)
.check(
StringMatching(
name="case_id_provided",
keyword="SEC-",
text_key="trace.last.outputs",
)
)
# Second interaction
.interact(
inputs="The last transfer was $9,000 to ACME Ltd.",
outputs=lambdainputs:(
"Understood. I escalated this as potential fraud "
"and locked the account."
),
)
.check(
StringMatching(
name="escalation_confirmed",
keyword="escalated",
text_key="trace.last.outputs",
)
)
)
result =await test_scenario.run()
result.print_report()
Output
ββββββββββββββββββββββββββββββββββββββββββββββββββββ β PASSED ββββββββββββββββββββββββββββββββββββββββββββββββββββcase_id_providedPASSescalation_confirmedPASSββββββββββββββββββββββββββββββββββββββββββββββββββββββ Trace ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Interaction 1 ββββββββββββββββββββββββββββββββββββββββββββββββββ
Inputs: 'I think my account was compromised.'
Outputs: 'Thanks. I have opened case ID SEC-1042. Can you confirm the last transaction?'ββββββββββββββββββββββββββββββββββββββββββββββββββ Interaction 2 ββββββββββββββββββββββββββββββββββββββββββββββββββ
Inputs: 'The last transfer was $9,000 to ACME Ltd.'
Outputs: 'Understood. I escalated this as potential fraud and locked the account.'βββββββββββββββββββββββββββββββββββββββββββββββββ 2 steps in 9ms ββββββββββββββββββββββββββββββββββββββββββββββββββ
Add a check after every .interact() call β not just at the end. This pinpoints
exactly which turn broke the expected behavior.
Key Points:
Components execute in sequence
Checks can reference any interaction via the trace
The basic flow above uses hard-coded outputs. Now weβll test a real stateful
system where the chatbot maintains its own conversation history and must recall
information from an earlier turn.
from giskard.checks import Scenario, FnCheck, StringMatching
response =f"Got it. I am tracking case {case_id}."
elif"what case are we"in message.lower():
# Reference earlier context
for msg inreversed(self.conversation_history):
if"case id is"in msg.get("content","").lower():
case_id =msg["content"].split("case id is")[-1].strip()
response =f"We are discussing case {case_id}."
break
else:
response ="I don't see a case ID yet."
else:
response ="I understand."
self.conversation_history.append(
{"role":"assistant","content": response}
)
return response
bot =Chatbot()
test_scenario =(
Scenario("case_id_memory")
.interact(
inputs="My case ID is SEC-1042.",
outputs=lambdainputs: bot.chat(inputs),
)
.check(
FnCheck(fn=
lambdatrace:"SEC-1042"in trace.last.outputs,
name="acknowledges_case_id",
)
)
.interact(
inputs="What case are we discussing?",
outputs=lambdainputs: bot.chat(inputs),
)
.check(
StringMatching(
name="remembers_case_id",
keyword="SEC-1042",
text_key="trace.last.outputs",
)
)
)
result =await test_scenario.run()
result.print_report()
Output
ββββββββββββββββββββββββββββββββββββββββββββββββββββ β PASSED ββββββββββββββββββββββββββββββββββββββββββββββββββββacknowledges_case_idPASSremembers_case_idPASSββββββββββββββββββββββββββββββββββββββββββββββββββββββ Trace ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Interaction 1 ββββββββββββββββββββββββββββββββββββββββββββββββββ
Inputs: 'My case ID is SEC-1042.'
Outputs: 'Got it. I am tracking case My case ID is SEC-1042..'ββββββββββββββββββββββββββββββββββββββββββββββββββ Interaction 2 ββββββββββββββββββββββββββββββββββββββββββββββββββ
Inputs: 'What case are we discussing?'
Outputs: 'We are discussing case Got it. I am tracking case My case ID is SEC-1042...'βββββββββββββββββββββββββββββββββββββββββββββββββ 2 steps in 5ms ββββββββββββββββββββββββββββββββββββββββββββββββββ
This tells Giskard to call bot.chat() for each turn and assert that the case
ID surfaces in both responses. If the second check fails, you immediately know
the chatbot lost context between turns rather than having to trace through a
generic failure.
Name each scenario after the user flow it covers, for example case_id_memory
or booking_invalid_date. This makes failure reports immediately readable.