LLM Agent Testing Platform

Giskard Hub is our enterprise platform for LLM agent testing with team collaboration and continuous red teaming. The Hub provides a comprehensive user interface for performing LLM evaluations in production environments with enterprise-grade security and collaboration features.

The Hub is the user interface from which you can perform LLM evaluations. It implements the following 4-step workflow:

Giskard Hub 4-step workflow: setup, scan, create test cases, evaluate

Setup projects, agents and knowledge bases Set up projects, agents and knowledge bases.

Launch vulnerability scans Automatically scan your agent for safety and security failures.

Create test cases and datasets Create test cases and datasets manually or using synthetic data generation.

Review and refine test cases and metrics Review and refine test cases and metrics with humans in the loop.

Run, review, schedule and compare evaluation runs Run evaluations and schedule them to run automatically.

Release notes View the latest features and changes.

Agent testing workflow

graph LR
    B[<a href="/hub/ui/scan" target="_self">Red Team Scan</a>] -->  D[<a href="/hub/ui/datasets" target="_self">Create Test Cases</a>]
    D --> F[<a href="/hub/ui/annotate" target="_self">Annotate & Assign Checks</a>]
    F --> G[<a href="/hub/ui/evaluations" target="_self">Run Evaluations</a>]
    G --> H[<a href="/hub/ui/evaluations" target="_self">Review Results</a>]
    H --> F
    H --> B

The dashboard

The Dashboard is the first page you’ll see upon logging in. It provides an overview of your project, displaying the number of agents, datasets, evaluations, and knowledge bases.

It also features a graph showing the agent’s performance over time, measured by the average success rate of the evaluations. The success rate is calculated based on some evaluation metrics, such as Conformity, Correctness, Groundedness, String Matching, Metadata, Semantic Similarity, and more. We’ll delve into these metrics in more detail in the Evaluations section.

Additionally, the dashboard lists your most recent evaluations and datasets for quick access.

Giskard Hub project dashboard showing agent performance graph

Create a project

In this section, you will learn how to create a project.

First, click on the “Settings” icon on the left panel, this page allows you to manage your projects and users (if you have the proper access rights).

In the Projects tab, click on “Create project” button. A modal will appear where you can enter your project’s name and description.

Create project dialog with name and description fields

Once the project is created, you can access its dashboard by clicking on it in the list. Alternatively, use the dropdown menu in the upper left corner of the screen to select the project you want to work on.

Setup an agent

This section guides you through creating a new agent.

On the Agents page, click on the “New agent” button.

Agent list page with new agent button

The interface below displays the agent details that need to be filled out.

Agent configuration form with API endpoint and header settings

Name: The name of the agent.
Description: Used to refine automatic evaluation and generation for better accuracy in your specific use case.
Supported Languages: Add the languages your agent can handle. Note that this affects data generation.
Connection Settings:
- Agent API Endpoint: The URL of your agent’s API endpoint. This is where requests are sent to interact with your agent.
- Headers: These are useful for authentication and other custom headers.
- Stateful mode: Controls how the Hub handles conversation history when calling your agent.

In the default (stateless) mode, the endpoint should expect an object with the following structure:

{
    "messages": [
        {"role": "user", "content": "Hello!"},
        {"role": "assistant", "content": "Hello! How can I help you?"},
        {"role": "user", "content": "What color is an orange?"},
    ]
}

And the response should have the following structure:

{
    "response": {"role": "assistant", "content": "An orange is green"},
    "metadata": {"some_key": "whatever value"},
}

In stateful mode, the Hub sends only the latest user message along with a thread_id in metadata, and your agent is responsible for storing the conversation history server-side. After the first turn, requests and responses look like this:

{
    "messages": [
        {"role": "user", "content": "What color is an orange?"},
    ],
    "metadata": {"thread_id": "abc-123"}
}

{
    "response": {"role": "assistant", "content": "An orange is orange."},
    "metadata": {"thread_id": "abc-123"}
}

The Hub also supports authenticated endpoints and chatbots whose native API doesn’t match this canonical shape, and the Giskard team will help you configure these during onboarding. For details on authentication, connecting custom chatbot formats, and stateful conversations, see Setup agents.

Import a knowledge base

This section guides you through importing your custom knowledge base.

On the Knowledge Bases, click on “Add Knowledge Base” button.

Knowledge base list with add knowledge base button

The interface below displays the knowledge base details that need to be filled out.

Knowledge base import form with name and file upload fields

Name: The name of the knowledge base.
File: The document to upload, containing the knowledge base content. Supported formats are:
- JSON: A JSON file containing an array of objects
- JSONL: A JSON Lines file with one object per line

JSON/JSONL format requirements:

Each object in your JSON or JSONL file should have the following structure:

{
  "text": "Your document content here",
  "topic": "Optional topic classification"
}

text (required): The document content
topic (optional): The topic classification for the document

General rules for all formats:

If the text has a value but the topic is blank, the topic will be set to ‘Others’. However, if all topics are blank, the topic will be automatically generated.
If both the text and topic are blank, or if the text is blank but the topic has a value, the entry will not be imported.

The interface below displays information about the knowledge base and its content with corresponding topics. As mentioned above, if no topics were uploaded with the knowledge base, Giskard Hub will also identify and generate them for you. In the example below, the knowledge base is ready to be used with over 1200 documents and 7 topics.

Imported knowledge base showing document count and topics

Need help?

Documentation: Browse the Hub UI guides for step-by-step walkthroughs
Community: Join our Discord ↗ for support and discussions
Enterprise: Need on-premise deployment or dedicated support? Contact us ↗
Free trial: Not using Giskard Hub yet? Request your enterprise trial