AI Agent Evaluation
Evaluations are the core of the testing process in Giskard Hub. They allow you to run your test datasets against your AI agents and systematically assess their performance, safety, and security using the checks that you have defined.
The Giskard Hub provides a comprehensive AI agent evaluation system that supports:
- Local evaluations: Run evaluations locally using development agents
- Remote evaluations: Run evaluations in the Hub using deployed agents
- Scheduled evaluations: Automatically run evaluations at specified intervals
In this section, we will walk you through how to run and manage evaluations using the Hub interface.
In this section, we will walk you through how to manage evaluations in Giskard Hub.
Run evaluations Create evaluations
Schedule evaluations Schedule evaluations to run automatically.
Compare evaluations Compare evaluations to see if there are any regressions.
Evaluation workflow
Section titled “Evaluation workflow”graph LR
A([<a href="/hub/ui/evaluations/create" target="_self">Run Evaluation</a>]) --> B([<a href="/hub/ui/evaluations" target="_self">Review Results</a>])
B --> C{Analysis}
C -->|Compare Versions| D([<a href="/hub/ui/evaluations/compare" target="_self">Compare Evaluations</a>])
C -->|Schedule Automation| E([<a href="/hub/ui/evaluations/schedule" target="_self">Schedule Evaluation</a>])
D --> F{Next Steps}
E --> F
F -->|Iterate| A
F -->|Fix Issues| G[<a href="/hub/ui/annotate" target="_self">Update Test Cases</a>]
G --> A