Skip to content
GitHubDiscord

Quickstart

This tutorial walks you through installing the SDK, connecting to the Hub, and running a complete evaluation against an LLM agent — from dataset creation to reading results.

The fastest way to set up the Giskard Hub SDK. Paste a single URL into your coding agent and it handles everything — dependency installation, authentication, and environment setup.

  1. Paste the URL into any coding agent (Claude Code, Cursor, Windsurf, Copilot, etc.)
  2. The agent reads the quickstart instructions from this page
  3. The agent installs giskard-hub and configures authentication
  4. You review the changes and start running evaluations

  • Python 3.10 or later
  • A running Giskard Hub instance (cloud or self-hosted)
  • An API key from the Hub UI

Click the user badge in the bottom-left corner of the Hub UI, then copy the API Key value:

Finding your API key in the Hub UI

Terminal window
pip install giskard-hub

The SDK reads your Hub URL and API key from environment variables. Set them before running any code:

Terminal window
export GISKARD_HUB_BASE_URL="https://your-hub-instance.example.com"
export GISKARD_HUB_API_KEY="gsk_..."

Alternatively, pass them directly to the client constructor:

from giskard_hub import HubClient
hub = HubClient(
base_url="https://your-hub-instance.example.com",
api_key="gsk_...",
)

Projects are the top-level container for all your resources. Create one or retrieve an existing one:

# Create a new project
project = hub.projects.create(
name="Customer Support Bot",
description="Evaluation project for our support chatbot",
)
# Or list existing projects and pick one
projects = hub.projects.list()
project = projects[0]
print(f"Using project: {project.name} ({project.id})")

An agent points to your LLM application. The Hub calls this endpoint during evaluations.

agent = hub.agents.create(
project_id=project.id,
name="Support Bot v1",
description="GPT-4o-based customer support chatbot",
url="https://your-app.example.com/api/chat",
supported_languages=["en"],
headers={"Authorization": "Bearer <your-app-token>"},
)
print(f"Agent registered: {agent.id}")

Before building a dataset, run a quick scan to surface security weaknesses in your agent:

scan = hub.scans.create(
project_id=project.id,
agent_id=agent.id,
tags=["gsk:threat-type='prompt-injection'"],
)
print(f"Scan started: {scan.id}")
scan = hub.helpers.wait_for_completion(scan)
print(f"Scan complete. Grade: {scan.grade}")
# Print detailed probe results
hub.helpers.print_metrics(scan)

The grade ranges from A (no issues found) to D (critical vulnerabilities detected). See Vulnerability Scanning for the full tag catalogue, KB-grounded scans, and how to review probe results and turn successful attacks into test cases.

A dataset is a collection of test cases — conversations with expected outcomes and quality checks.

dataset = hub.datasets.create(
project_id=project.id,
name="Core Q&A Suite",
description="Basic correctness and tone checks",
)
# Add a test case
hub.test_cases.create(
dataset_id=dataset.id,
messages=[
{"role": "user", "content": "What is your return policy?"},
],
demo_output="We offer a 30-day return policy for all items.",
checks=[
{
"identifier": "correctness",
"params": {
"reference": "We offer a 30-day return policy for all items."
},
},
],
)

The checks field controls which criteria are applied to each agent response — these can be LLM-judge, embedding similarity, or rule-based checks. See Datasets & Checks for the full list of built-in checks and how to define custom ones.

Now trigger an evaluation that sends every test case to your agent and scores the responses:

evaluation = hub.evaluations.create(
project_id=project.id,
agent_id=agent.id,
dataset_id=dataset.id,
name="v1 baseline",
)
print(f"Evaluation started: {evaluation.id}")
evaluation = hub.helpers.wait_for_completion(evaluation)
print("Evaluation complete!")

Once complete, print the metrics summary and inspect individual results:

# Print a formatted metrics table
hub.helpers.print_metrics(evaluation)

Evaluation metrics output

You can also iterate over individual results programmatically:

results = hub.evaluations.results.list(evaluation.id)
for result in results:
print(f"Test case {result.test_case.id}: {result.state}")
for check in result.results:
print(f" {check.name}: {'passed' if check.passed else 'failed'}")

You can also view the full evaluation with aggregated metrics in the Hub UI.

  • Local agents: evaluate a Python function directly without an HTTP endpoint — see Evaluations
  • Generate test cases automatically: use scenarios or knowledge bases — see Datasets & Checks
  • Vulnerability scanning: find security weaknesses with Scans
  • Schedule recurring runs: see Scheduled Evaluations
  • Full API details: see the API Reference