Run, schedule and compare evaluations
Evaluations are the core of the testing process in Giskard Hub. They allow you to run your test datasets against your agents and evaluate their performance using the checks that you have defined.
The Giskard Hub provides a comprehensive evaluation system that supports:
Local evaluations: Run evaluations locally using development agents
Remote evaluations: Run evaluations in the Hub using deployed agents
Scheduled evaluations: Automatically run evaluations at specified intervals
In this section, we will walk you through how to run and manage evaluations using the Hub interface.
Tip
š” When to execute your tests?
Depending on your AI lifecycle, you may have different reasons to execute your tests:
Development time: Compare agent versions during development and identify the right correction strategies for developers.
Deployment time: Perform non-regression testing in the CI/CD pipeline for DevOps.
Production time: Provide high-level reporting for business executives to stay informed about key vulnerabilities in a running agent.
In this section, we will walk you through how to manage evaluations in Giskard Hub.
Create evaluations
Schedule evaluations to run automatically.
Compare evaluations to see if there are any regressions.
Note
Local evaluations are supported via the SDK. To run evaluations against local development agents, see Run local evaluations.