Run, schedule and compare evaluations
In this section, we will walk you through how to run and manage evaluations using the SDK.
Evaluations are the core of the testing process in Giskard Hub. They allow you to run your test datasets against your agents and evaluate their performance using the checks that you have defined. We recommend to systematically launch evaluation runs every time you deploy an updated agent in a pre-production or staging environment. In this way, you can collaborate with your team to ensure that the agent is performing as expected.
Run local evaluations
Run evaluations against a local agent.
Run remote evaluations
Run evaluations against a remote agent.
Schedule evaluations
Schedule evaluations to run automatically.
Compare evaluations
Compare evaluations to see if there are any regressions.