Skip to content
GitHubDiscord

Overview

Evaluations are the core of the testing process in Giskard Hub. They allow you to run your test datasets against your agents and evaluate their performance using the checks that you have defined.

The Giskard Hub provides a comprehensive evaluation system that supports:

  • Local evaluations: Run evaluations locally using development agents
  • Remote evaluations: Run evaluations in the Hub using deployed agents
  • Scheduled evaluations: Automatically run evaluations at specified intervals

In this section, we will walk you through how to run and manage evaluations using the Hub interface.

In this section, we will walk you through how to manage evaluations in Giskard Hub.

graph LR
    A([<a href="/hub/ui/evaluations/create" target="_self">Run Evaluation</a>]) --> B([<a href="/hub/ui/evaluations" target="_self">Review Results</a>])
    B --> C{Analysis}
    C -->|Compare Versions| D([<a href="/hub/ui/evaluations/compare" target="_self">Compare Evaluations</a>])
    C -->|Schedule Automation| E([<a href="/hub/ui/evaluations/schedule" target="_self">Schedule Evaluation</a>])
    D --> F{Next Steps}
    E --> F
    F -->|Iterate| A
    F -->|Fix Issues| G[<a href="/hub/ui/annotate" target="_self">Update Test Cases</a>]
    G --> A