Letβs consider a simple question-answering bot. We want to test that the answers
of our bot are correct according to some context information.
In the checks framework, you test a Trace. A Trace is an immutable record
of everything exchanged with the system under test (SUT). It contains one or
more Interactions, where each Interaction corresponds to a single turn
(inputs + outputs).
For detailed explanations of the core concepts (Trace, Interaction, Check,
Scenario), see Core Concepts.
For our simple Q&A bot, we can represent a single turn as a trace with just one
interaction. The inputs and outputs can be anything the bot supports, as long as
they are serializable to JSON. For now, weβll assume our bot takes an input
string (question) and returns a string (the answer).
from giskard.checks import Scenario, Groundedness
# Use the fluent builder to create a scenario with an interaction and checks
test_scenario =(
Scenario("test_france_capital")
.interact(
inputs="What is the capital of France?",
outputs="The capital of France is Paris.",# generated by the bot
)
.check(
Groundedness(
name="answer is grounded",
answer_key="trace.last.outputs",
context="""France is a country in Western Europe. Its capital
and largest city is Paris, known for the Eiffel Tower
and the Louvre Museum.""",
)
)
)
In practice, weβll get the outputs directly from the bot, or maybe from a
dataset of previously recorded interactions.
Note how we created the groundedness check:
name: this is an (optional) name for the check, to make it easier to
interpret the results
answer_key: this is the key (in JSONPath) to the answer in the trace. All
JSONPath keys must start with trace. The last property is a shortcut for
interactions[-1] and can be used in both JSONPath keys and Python code. In
this case we want to check the outputs attribute of the last interaction in
the trace (this is the default)
context: this is the context information that will be used to check if the
answer is grounded. Note that a context_key is also available if we want to
dynamically load the context from the trace itself.
We can now run the scenario and inspect the results. In a notebook, the
ScenarioResult renders with a rich display:
result =await test_scenario.run()
result.print_report()
Output
ββββββββββββββββββββββββββββββββββββββββββββββββββββ β PASSED ββββββββββββββββββββββββββββββββββββββββββββββββββββanswer is groundedPASSββββββββββββββββββββββββββββββββββββββββββββββββββββββ Trace ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Interaction 1 ββββββββββββββββββββββββββββββββββββββββββββββββββ
Inputs: 'What is the capital of France?'
Outputs: 'The capital of France is Paris.'ββββββββββββββββββββββββββββββββββββββββββββββββ 1 step in 1361ms βββββββββββββββββββββββββββββββββββββββββββββββββ
The run() method is asynchronous. In a script, wrap it with asyncio.run():
import asyncio
from giskard.checks import Scenario, Groundedness
asyncdefmain():
test_scenario =(
Scenario("test_france_capital")
.interact(
inputs="What is the capital of France?",
outputs="The capital of France is Paris.",
)
.check(
Groundedness(
name="answer is grounded",
answer_key="trace.last.outputs",
context="""France is a country in Western Europe. Its capital
and largest city is Paris, known for the Eiffel Tower
and the Louvre Museum.""",
)
)
)
result =await test_scenario.run()
result.print_report()
asyncio.run(main())
Output
ββββββββββββββββββββββββββββββββββββββββββββββββββββ β PASSED ββββββββββββββββββββββββββββββββββββββββββββββββββββanswer is groundedPASSββββββββββββββββββββββββββββββββββββββββββββββββββββββ Trace ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Interaction 1 ββββββββββββββββββββββββββββββββββββββββββββββββββ
Inputs: 'What is the capital of France?'
Outputs: 'The capital of France is Paris.'βββββββββββββββββββββββββββββββββββββββββββββββββ 1 step in 661ms βββββββββββββββββββββββββββββββββββββββββββββββββ
If youβre already inside an async function (like in pytest with
@pytest.mark.asyncio), you can call await test_scenario.run() directly.