Overview
Evaluations are the core of the testing process in Giskard Hub. They allow you to run your test datasets against your agents and evaluate their performance using the checks that you have defined.
The Giskard Hub provides a comprehensive evaluation system that supports:
- Local evaluations: Run evaluations locally using development agents
- Remote evaluations: Run evaluations in the Hub using deployed agents
- Scheduled evaluations: Automatically run evaluations at specified intervals
In this section, we will walk you through how to run and manage evaluations using the Hub interface.
In this section, we will walk you through how to manage evaluations in Giskard Hub.
Run evaluations Create evaluations
Schedule evaluations Schedule evaluations to run automatically.
Compare evaluations Compare evaluations to see if there are any regressions.
High-level workflow
Section titled “High-level workflow”graph LR
A([<a href="/hub/ui/evaluations/create" target="_self">Run Evaluation</a>]) --> B([<a href="/hub/ui/evaluations" target="_self">Review Results</a>])
B --> C{Analysis}
C -->|Compare Versions| D([<a href="/hub/ui/evaluations/compare" target="_self">Compare Evaluations</a>])
C -->|Schedule Automation| E([<a href="/hub/ui/evaluations/schedule" target="_self">Schedule Evaluation</a>])
D --> F{Next Steps}
E --> F
F -->|Iterate| A
F -->|Fix Issues| G[<a href="/hub/ui/annotate" target="_self">Update Test Cases</a>]
G --> A