Evaluate tests and assign validation rules
Each test case is composed of a conversation and its associated evaluation parameters (e.g. an expected answer, rules that the agent must respect, etc.).
A conversation is a list of messages. In the simplest case, a conversation is composed by a single message by the user. In the testing phase, we will send this message to your agent, record its answer, and evaluate it against the criteria that you defined in the test case.
In more advanced cases, the conversation is a multi-turn dialogue between the user and the agent, terminating with a final user message. When testing, we will pass the conversation history to your agent to generate the response that will be evaluated.
In this section, we will walk you through how to annotate tests with checks and tags using the Hub interface.
Learn about evaluation parameters in test cases and how to manage and create them.
Create and assign checks to conversations to evaluate the correctness, conformity, and other evaluation metrics.
Learn about tags and how to use them to organize your conversations.