Modify the test cases
This section guides you through the product owner workflow for modifying test cases. This workflow is designed for product owners and technical team members who need to refine test cases, adjust validation rules, and structure datasets based on review feedback.
Modify test cases
Section titled âModify test casesâDraft/Undraft your test case
Section titled âDraft/Undraft your test caseâDrafting and undrafting test cases allows you to control which test cases are included in evaluation runs.
Setting a test case to draft status:
- Excludes it from evaluation runs - Draft test cases are not used in evaluations until they are undrafted
- Indicates work in progress - Shows that the test case is being reviewed or modified
- Prevents biased metrics - Ensures that incomplete or problematic test cases donât affect your evaluation results
To draft a test case:
- Open the test case (conversation) you want to draft
- Set it to draft status using the draft toggle or option
- The test case will be excluded from future evaluation runs until it is undrafted
You can also set a test case to draft when creating a task from an evaluation run. This ensures that failed test cases are automatically excluded from subsequent evaluations until they are reviewed and fixed.
Hide/Unhide
Section titled âHide/UnhideâIn addition to drafting, you can hide false positive results to organize your evaluation overview:
- Hide - Makes the false positive result less visible in the evaluation overview and for the metrics computations in the dashboard
- Unhide - Makes the false positive result visible again in the evaluation overview
Rerun the test case
Section titled âRerun the test caseâAfter modifying a test case or its checks, you should rerun the test to validate your changes.
When to rerun:
- After modifying the conversation structure
- After updating the answer example
- After enabling or disabling checks
- After modifying check requirements
- After making any changes that could affect the test result
How to rerun:
- Make your modifications to the test case
- Use the âTestâ option to run the test case in isolation
- Review the results to see if your changes had the intended effect
- Continue iterating if needed
Rerunning helps you:
- Validate that your modifications work as expected
- Catch issues before including the test case in a full evaluation run
- Iterate quickly on test case improvements
- Ensure that your changes donât introduce new problems
Remove test case
Section titled âRemove test caseâIf a test case is not relevant to your use case or doesnât test meaningful behavior, you can remove it.
When to remove a test case:
- The test case is not relevant to your use case
- The scenario is too ambiguous or difficult to evaluate consistently
- You have duplicate or redundant test cases
- The test case concept is fundamentally flawed and cannot be fixed
How to remove:
- Open the test case you want to remove
- Use the delete or remove option
- Confirm the removal
Modify checks
Section titled âModify checksâChecks are evaluation criteria that measure the quality of your agentâs responses. You can enable or disable checks on individual test cases to control what is being evaluated.
It is important to understand any changes you make to the checks and how they will affect the evaluation results.
- Enable/Disable checks - Enable or disable checks on a test case to control what is being evaluated
- Modify check requirement - Modify the requirements of a check to better match your evaluation criteria
- Validate the check - Validate the check to ensure it works correctly
Enable/Disable checks
Section titled âEnable/Disable checksâYou can enable multiple checks on a single test case to evaluate different aspects of the agentâs response.
Disabling a check removes it from the evaluation for that specific test case, but the check definition remains available for use on other test cases.
Modify check requirements
Section titled âModify check requirementsâYou can adjust the parameters of most built-in checks (like context or reference answer) specifically for the current test case by editing them directly within the test case view. These changes only impact the selected test case.
If you want to change the requirements of a custom check (such as its overall rules or similarity threshold), you must edit the custom check itself from the Checks page. Modifying a custom check will affect all test cases using that check. For major or experimental changes, itâs recommended to create a new custom check insteadâthen enable it only on the test cases where you want the new behavior.
Validate the check
Section titled âValidate the checkâAfter modifying a check, you should validate it to ensure it works correctly.
Rerunning the agent answer
Section titled âRerunning the agent answerâTo validate that your check modifications work correctly:
- Rerun the test case - Execute the test case with the modified check
- Review the result - Check if the test passes or fails as expected
- Review the explanation - Understand why the check passed or failed
- Compare with expectations - Verify that the result matches what you intended
Rerunning the agent answer helps you:
- Verify that the check correctly evaluates the agentâs response in different scenarios
- Ensure that your modifications donât break the check
- Catch issues before using the check in full evaluation runs
Rerunning the check evaluation
Section titled âRerunning the check evaluationâYou may also need to validate the check evaluation by rerunning it multiples for each of the regenereated answers.
- Review check explanations - Understand how the check evaluated the response
- Check for consistency - Ensure the check provides consistent evaluations
- Validate against examples - Test the check against known good and bad examples
- Adjust if needed - Modify the check prompt or configuration if results are inconsistent
For more information about iterating on checks, see Overview.
Structure your test cases with tags
Section titled âStructure your test cases with tagsâTags are optional but highly recommended labels that help you organize and filter your test cases. Tags help you analyze evaluation results by allowing you to:
- Filter results - Focus on specific test cases or scenarios
- Compare performance - See how your agent performs across different test categories
- Identify weak areas - Discover which types of tests have higher failure rates
- Organize reviews - Review test cases by category or domain
Next steps
Section titled âNext stepsâNow that you understand how to modify test cases, you can:
- Review test results - Understand how test results are reviewed Review test results
- Distribute tasks - Learn how tasks are created and managed Task management
- Learn about checks - Get detailed information about check types Overview
- Learn about tags - Understand how to organize with tags Overview