Run, schedule and compare evaluations

In this section, we will walk you through how to run and manage evaluations using the SDK.

Evaluations are the core of the testing process in Giskard Hub. They allow you to run your test datasets against your agents and evaluate their performance using the checks that you have defined. We recommend to systematically launch evaluation runs every time you deploy an updated agent in a pre-production or staging environment. In this way, you can collaborate with your team to ensure that the agent is performing as expected.

Run local evaluations

Run evaluations against a local agent.

Run local evaluations
Run remote evaluations

Run evaluations against a remote agent.

Run remote evaluations
Schedule evaluations

Schedule evaluations to run automatically.

Schedule evaluations
Compare evaluations

Compare evaluations to see if there are any regressions.

Compare evaluations