Compare evaluations
Comparing evaluations is a crucial part of maintaining and improving your LLM agents over time. By comparing results across different versions, datasets, or time periods, you can:
- Detect regressions: Identify when agent performance has degraded
- Track improvements: Measure the impact of changes and optimizations
- Maintain quality standards: Ensure consistent performance across deployments
- Make data-driven decisions: Use metrics to guide development priorities
In this section, we will walk you through how to compare evaluations in Giskard Hub.
How to compare evaluations
Section titled âHow to compare evaluationsâOn the Evaluations page, select at least two evaluations to compare, then click the âCompareâ button in the top right corner of the table. The page will display a comparison of the selected evaluations.

Understanding the comparison view
Section titled âUnderstanding the comparison viewâFirst, it shows the success rate - the percentage of conversations that the checks passed in each evaluation. It also displays the percentage of each specific check. Then it presents a table listing the conversations, which can be filtered by results, such as whether the conversations in agenth evaluations passed or failed the checks.
Conversation-level analysis
Section titled âConversation-level analysisâClicking on a conversation will show a detailed comparison.

Within this comparison you can explore the performance of the agent on a specific conversation and metrics.
Next steps
Section titled âNext stepsâNow that you have compared evaluations, you can take action on the results.
- Schedule evaluations - Schedule evaluations