Skip to content
GitHubDiscord

AI Agent Evaluation

Evaluations are the core of the testing process in Giskard Hub. They allow you to run your test datasets against your AI agents and systematically assess their performance, safety, and security using the checks that you have defined.

The Giskard Hub provides a comprehensive AI agent evaluation system that supports:

  • Local evaluations: Run evaluations locally using development agents
  • Remote evaluations: Run evaluations in the Hub using deployed agents
  • Scheduled evaluations: Automatically run evaluations at specified intervals

In this section, we will walk you through how to run and manage evaluations using the Hub interface.

In this section, we will walk you through how to manage evaluations in Giskard Hub.

graph LR
    A([<a href="/hub/ui/evaluations/create" target="_self">Run Evaluation</a>]) --> B([<a href="/hub/ui/evaluations" target="_self">Review Results</a>])
    B --> C{Analysis}
    C -->|Compare Versions| D([<a href="/hub/ui/evaluations/compare" target="_self">Compare Evaluations</a>])
    C -->|Schedule Automation| E([<a href="/hub/ui/evaluations/schedule" target="_self">Schedule Evaluation</a>])
    D --> F{Next Steps}
    E --> F
    F -->|Iterate| A
    F -->|Fix Issues| G[<a href="/hub/ui/annotate" target="_self">Update Test Cases</a>]
    G --> A