Skip to content
GitHubDiscord

Reference


Client classes for interacting with the Giskard Hub API. Two flavours are available with an identical API surface — pick the one that matches your runtime.

from giskard_hub import HubClient
hub = HubClient()
projects = hub.projects.list()

Synchronous client. All resource operations are available as attributes.

Constructor
from giskard_hub import HubClient
hub = HubClient(
api_key="gsk_...", # or set GISKARD_HUB_API_KEY env var
base_url="https://hub.example.com", # or set GISKARD_HUB_BASE_URL env var
)
api_key str | None Default: env GISKARD_HUB_API_KEY

Your Hub API key.

base_url str | httpx.URL | None Default: env GISKARD_HUB_BASE_URL

Base URL of your Hub instance.

auto_add_api_suffix bool Default: True

Automatically append /_api to base_url.

timeout float | httpx.Timeout | None Default: 60.0

Default request timeout in seconds. Pass an httpx.Timeout for fine-grained control over connect, read, and write timeouts.

max_retries int Default: 2

Number of automatic retries on transient errors (connection errors, 5xx responses).

default_headers dict[str, str] | None Default: None

Headers added to every request.

default_query dict[str, object] | None Default: None

Query parameters added to every request.

http_client httpx.Client | None Default: None

Custom httpx.Client instance for proxies, custom transports, or mutual TLS.

Async counterpart with an identical API surface — every method is a coroutine. Accepts the same constructor arguments as HubClient, except http_client takes an httpx.AsyncClient instead of an httpx.Client.

Constructor
from giskard_hub import AsyncHubClient
hub = AsyncHubClient(
api_key="gsk_...", # or set GISKARD_HUB_API_KEY env var
base_url="https://hub.example.com", # or set GISKARD_HUB_BASE_URL env var
)
api_key str | None Default: env GISKARD_HUB_API_KEY

Your Hub API key.

base_url str | httpx.URL | None Default: env GISKARD_HUB_BASE_URL

Base URL of your Hub instance.

auto_add_api_suffix bool Default: True

Automatically append /_api to base_url.

timeout float | httpx.Timeout | None Default: 60.0

Default request timeout in seconds. Pass an httpx.Timeout for fine-grained control over connect, read, and write timeouts.

max_retries int Default: 2

Number of automatic retries on transient errors (connection errors, 5xx responses).

default_headers dict[str, str] | None Default: None

Headers added to every request.

default_query dict[str, object] | None Default: None

Query parameters added to every request.

http_client httpx.AsyncClient | None Default: None

Custom httpx.AsyncClient instance for proxies, custom transports, or mutual TLS.


Resource groups exposed by the client for managing Hub entities.

Register, test, and invoke LLM agents. An agent represents your LLM application -- either a remote HTTP endpoint or a local Python callable.

from giskard_hub.types import (
Agent,
AgentDetectStatefulness,
AgentOutput,
ChatMessage,
)

Create a new agent with configuration for external API communication.

name str Required

Display name of the agent.

url str Required

HTTP endpoint the Hub calls during evaluations and scans.

project_id str Required

Project this agent belongs to.

supported_languages list[str] Required

Language codes the agent supports (e.g. ["en", "fr"]).

headers dict[str, str]

HTTP headers sent with every request to the agent (e.g. auth tokens). Each header is a {"name": str, "value": str} dict.

description str | None

Human-readable description.

stateful bool | None

Whether the agent is stateful.

Example
agent = hub.agents.create(
project_id=project.id,
name="Support Bot v2",
url="https://my-app.example.com/api/chat",
supported_languages=["en"],
headers={"Authorization": "Bearer <token>"},
description="GPT-4o chatbot with RAG",
)

Retrieve an agent by its ID.

agent_id str Required

ID of the agent to retrieve.

Update an existing agent’s configuration. Only the provided fields are modified.

agent_id str Required

ID of the agent to update.

name str | None
Updated display name.
url str | None
Updated endpoint URL.
description str | None
Updated description.
headers dict[str, str] | None
Updated HTTP headers.
supported_languages list[str] | None
Updated language codes.

List all agents, optionally filtered by project.

project_id str | None
Project ID to filter by.

Delete an agent by its ID.

agent_id str Required
ID of the agent to delete.

Delete multiple agents at once.

agent_ids list[str] Required
IDs of agents to delete.

Call a registered agent with a list of messages and get the response.

agent_id str Required

ID of the agent to call.

messages Iterable[ChatMessageParam] Required

Conversation messages as [{"role": "user", "content": "..."}].

Example
output = hub.agents.generate_completion(
agent.id,
messages=[{"role": "user", "content": "What is your return policy?"}],
)
print(output.response.content)
print(output.metadata)

Test connectivity to an agent endpoint without persisting the agent.

url str Required
HTTP endpoint URL to test.
headers dict[str, str]
HTTP headers to include in the test request.

Auto-generate a description for an agent by observing its behaviour. Returns the generated description.

agent_id str Required
ID of the agent.

Detect whether the agent is stateful by analyzing its behavior.

agent_id str Required
ID of the agent to detect statefulness for.

Define and manage reusable check criteria for evaluating agent responses. Checks are project-scoped and can be referenced by identifier in any test case.

from giskard_hub.types import Check, CheckResult

Create a custom check in the specified project.

identifier str Required

Unique identifier to reference this check in test cases.

name str Required

Display name.

project_id str Required

Project this check belongs to.

params CheckTypeParam Required

Check configuration (see check type params below).

description str | None

Human-readable description.

Example
check = hub.checks.create(
project_id=project.id,
identifier="tone_professional",
name="Professional tone",
params={"type": "conformity", "rules": ["Use formal language."]},
)
check_id str Required

ID of the check to retrieve.

Update an existing check. Only the provided fields are modified.

check_id str Required
ID of the check to update.
identifier str | None
Updated identifier.
name str | None
Updated name.
params CheckTypeParam | None
Updated check params.
description str | None
Updated description.
project_id str Required

Project ID to list checks for.

filter_builtin bool

Whether to filter out built-in checks from the results. Default True.

check_id str Required

ID of the check to delete.

check_ids list[str] Required

IDs of checks to delete.

The params field accepts one of these shapes:

Typeparams shapeEvaluation method
Correctness{"type": "correctness", "reference": str}LLM judge
Conformity{"type": "conformity", "rules": list[str]}LLM judge
Groundedness{"type": "groundedness", "context": str}LLM judge
Semantic similarity{"type": "semantic_similarity", "reference": str, "threshold": float}Embedding
String match{"type": "string_match", "keyword": str}Rule-based
Metadata{"type": "metadata", "json_path_rules": list[JsonPathRule]}Rule-based

Each JsonPathRule: {"json_path": str, "expected_value": str, "expected_value_type": "string" | "number" | "boolean"}


Create datasets, import test cases, and auto-generate test suites from scenarios or knowledge bases.

from giskard_hub.types import Dataset, TestCase, TaskProgress

Create a new empty dataset in the specified project.

name str Required
Display name.
project_id str Required
Project this dataset belongs to.
description str | None
Human-readable description.

Import test cases from a file or list of dicts into a dataset.

project_id str Required
Project ID.
data FileTypes | list[dict[str, Any]] | str Required

File path (str or Path), file-like object, or list of dicts. Each record should have a messages list and optional checks list.

dataset_id str | None
Append to an existing dataset instead of creating a new one.
name str | None
Name for the new dataset.

Generate a dataset of test cases from scenario definitions. The dataset’s status will be "running" until generation completes — use hub.helpers.wait_for_completion() to wait.

project_id str Required
Project ID.
agent_id str Required
Agent to generate test cases for.
scenario_id str Required
Scenario template to use.
n_examples int
Number of test cases to generate.
dataset_id str | None
Append to an existing dataset.
dataset_name str | None
Name for the new dataset.

Generate test cases grounded in knowledge base documents. Async — use hub.helpers.wait_for_completion().

agent_id str Required
Agent to generate test cases for.
knowledge_base_id str Required
Knowledge base to source documents from.
project_id str Required
Project ID.
dataset_name str
Name for the new dataset.
description str | None
Dataset description.
n_examples int
Number of test cases to generate.
topic_ids list[str]
Filter to specific KB topics.
dataset_id str Required

ID of the dataset to retrieve.

dataset_id str Required

ID of the dataset to update.

name str | None

Updated name.

description str | None

Updated description.

status TaskProgress | None

Async operation status.

project_id str | None

Project ID to filter by.

Delete a dataset by its ID.

dataset_id str Required
Dataset ID.

Delete multiple datasets at once.

dataset_ids list[str] Required
IDs of datasets to delete.

List all tags used across test cases in a dataset.

dataset_id str Required
Dataset ID.

List all test cases in a dataset.

dataset_id str Required
Dataset ID.

Search test cases with filters, sorting, and pagination. Pass include_metadata=True to receive tuple[list[TestCase], APIPaginatedMetadata].

dataset_id str Required
Dataset ID.
query str | None
Free-text search query.
order_by list[TestCaseOrderByParam] | None
Sorting criteria.
filters TestCaseFiltersParam | None
Filter criteria.
limit int | None
Maximum results per page.
offset int | None
Results offset for pagination.
include_metadata bool Default: False
Include pagination metadata in the return value.

Run agents against datasets, inspect per-test-case results, and manage the evaluation lifecycle. Sub-resource: hub.evaluations.results.

from giskard_hub.types import Evaluation, Metric, CheckResult

Create and launch a new evaluation of an agent on a dataset.

project_id str Required

Project ID.

agent_id str Required

Agent to evaluate.

dataset_id str | None

Dataset to evaluate against. Provide this or old_evaluation_id, not both.

old_evaluation_id str | None

Reuse a previous evaluation’s dataset.

name str

Evaluation run name.

tags list[str] | None

Filter test cases by tags.

run_count int

Run each test case N times (for consistency testing).

scheduled_evaluation_id str | None

Link to a scheduled evaluation.

Example
evaluation = hub.evaluations.create(
project_id=project.id,
agent_id=agent.id,
dataset_id=dataset.id,
name="v2.1 regression run",
)
evaluation = hub.helpers.wait_for_completion(evaluation)
hub.helpers.print_metrics(evaluation)

Create a local evaluation for running agent inference in your own process.

agent_info MinimalAgentParam Required

Agent info as {"name": str, "description": str}.

dataset_id str | None
Dataset to evaluate against.
name str | None
Evaluation name.
tags list[str] | None
Filter test cases by tags.
old_evaluation_id str | None
Reuse a previous evaluation’s dataset.

Evaluate a single (input, output) pair against checks without creating a full evaluation.

messages Iterable[ChatMessageParam] Required
Conversation messages.
agent_output AgentOutputParam Required
Agent’s output to evaluate.
checks Iterable[CheckConfigParam] Required
Checks to apply.
project_id str | None
Project ID.
agent_description str
Description of the agent for context.

Rerun all errored results without triggering a full re-evaluation.

evaluation_id str Required
Evaluation ID.

Retrieve an evaluation by its ID, with optional related resource inclusion.

evaluation_id str Required
Evaluation ID.
include list[Literal["agent", "dataset"]] | None
Embed the full agent and/or dataset objects instead of references.

Update an evaluation’s name.

evaluation_id str Required
Evaluation ID.
name str Required
New name for the evaluation.

List all evaluations for a project.

project_id str Required
Project ID.
include list[Literal["agent", "dataset"]] | None
Embed related objects.

Delete an evaluation by its ID.

evaluation_id str Required
Evaluation ID.

Delete multiple evaluations at once.

evaluation_ids list[str] Required
IDs of evaluations to delete.

Inspect, filter, update, and rerun individual evaluation results.

from giskard_hub.types import TestCaseEvaluation, FailureCategory
result_id str Required

Result ID.

evaluation_id str Required

Evaluation ID.

include list[Literal["test_case"]] | None

Embed related resources.

Update the failure category of an evaluation result.

result_id str Required
Result ID.
evaluation_id str Required
Evaluation ID.
failure_category FailureCategoryParam | None
Failure classification to assign.
evaluation_id str Required

Evaluation ID.

include list[Literal["test_case"]] | None

Embed related resources.

Search and filter results. Pass include_metadata=True for pagination metadata.

evaluation_id str Required
Evaluation ID.
query str | None
Free-text search query.
filters ResultFiltersParam | None
Filter criteria.
order_by list[ResultOrderByParam] | None
Sorting criteria.
limit int | None
Maximum results.
offset int | None
Results offset.
include list[Literal["test_case"]] | None
Embed related resources.
include_metadata bool Default: False
Include pagination metadata.
result_id str Required

Result ID.

evaluation_id str Required

Evaluation ID.

Submit locally-generated agent output for evaluation and scoring.

result_id str Required
Result ID.
evaluation_id str Required
Evaluation ID.
agent_output AgentOutputParam | None
Agent output to submit.
error str | None
Error message if the agent call failed.

Show or hide a result from the default view.

result_id str Required
Result ID.
evaluation_id str Required
Evaluation ID.
hidden bool Required
Whether the result should be hidden.
set_test_case_draft bool | None
Also set the linked test case to draft status.

High-level convenience methods for the most common SDK workflows: waiting for async operations, running evaluations, and printing metrics.

from giskard_hub.types import Evaluation, Scan, ChatMessage, AgentOutput

Poll an entity until it leaves its running state. Returns the refreshed entity.

entity TStateful Required

Any stateful entity: Evaluation, Scan, Dataset, KnowledgeBase, ScanProbe, TestCaseEvaluation.

poll_interval float Default: 5.0

Seconds between polling requests.

max_retries int Default: 360

Maximum polling attempts. Default: 30 minutes at 5-second intervals.

running_states Collection[str] Default: {"running"}

States considered as “still processing”.

error_states Collection[str] Default: {"error"}

Terminal error states.

raise_on_error bool Default: True

Raise ValueError if entity enters an error state.

Run an evaluation for a given agent over a dataset. Handles both remote and local agents.

agent str | Agent | Callable Required

Agent ID, Agent object, or a Python callable for local evaluation. Callable signature: (messages: list[ChatMessage]) -> str | ChatMessage | AgentOutput.

dataset str | Dataset Required

Dataset ID or Dataset object.

project str | Project | None

Required when agent is remote (str or Agent). Not required for local callables.

name str | None
Evaluation run name.
tags list[str] | None
Filter test cases by tags.
evaluation = hub.helpers.evaluate(
agent=my_agent, dataset=my_dataset,
project=my_project, name="Remote eval",
)

Print a formatted metrics table to the console for an evaluation or scan.

entity Evaluation | Scan Required

The evaluation or scan to print metrics for.


Create, search, and manage indexed document collections for grounded evaluations, document-based test generation, and knowledge-grounded vulnerability scans.

from giskard_hub.types import (
KnowledgeBase,
KnowledgeBaseDocumentRow,
KnowledgeBaseDocumentDetail,
)

Create a knowledge base and upload documents. Indexing happens asynchronously after creation — use hub.helpers.wait_for_completion().

name str Required

Display name.

project_id str Required

Project this KB belongs to.

data FileTypes | list[dict[str, Any]] | str Required

Documents as a list of dicts, a file path string, or a pathlib.Path (JSON/JSONL format).

description str | None

Human-readable description.

document_column str

Column name for document text. Server defaults to "text" if omitted.

topic_column str

Column name for topic label. Server defaults to "topic" if omitted.

Example
kb = hub.knowledge_bases.create(
project_id=project.id,
name="Product Docs",
data=[
{"text": "30-day return policy.", "topic": "Returns"},
{"text": "Free shipping over $50.", "topic": "Shipping"},
],
)
kb = hub.helpers.wait_for_completion(kb)

Semantic search over documents in a knowledge base.

knowledge_base_id str Required
Knowledge base ID.
query str | None
Search query.
Filter criteria.
order_by list[KnowledgeBaseDocumentOrderByParam] | None
Sorting criteria.
limit int | None
Maximum results.
offset int | None
Results offset.
include_metadata bool Default: False
Include pagination metadata. If true, returns a tuple of (results, metadata).

Retrieve a specific document with its full content.

knowledge_base_id str Required
Knowledge base ID.
document_id str Required
Document ID.

Retrieve a knowledge base by its ID, including its topics.

knowledge_base_id str Required
Knowledge base ID.

Update a knowledge base’s metadata.

knowledge_base_id str Required
Knowledge base ID.
name str | None
Updated name.
description str | None
Updated description.
project_id str | None
Project ID to move the knowledge base to.
status TaskProgress | None
Async operation status.

List all knowledge bases, optionally filtered by project.

project_id str | None
Project ID to filter by.

Delete a knowledge base by its ID.

knowledge_base_id str Required
Knowledge base ID.

Delete multiple knowledge bases at once.

knowledge_base_ids list[str] Required
IDs of knowledge bases to delete.

Top-level workspace that groups all related resources: agents, datasets, evaluations, scans, and more. Sub-resource: hub.projects.scenarios.

from giskard_hub.types import Project
name str Required

Project name.

description str | None

Project description.

project_id str Required

Project ID.

name str | None

Updated name.

description str | None

Updated description.

failure_categories Iterable[FailureCategoryParam] | None

Project-level failure classifications.

Retrieve a project by its ID.

project_id str Required
Project ID.

List all projects accessible to the current user.

Delete a project by its ID.

project_id str Required
Project ID.

Delete multiple projects at once.

project_ids list[str] Required
IDs of projects to delete.

Reusable persona and behaviour templates for scenario-based dataset generation.

from giskard_hub.types import Scenario, ScenarioPreview
project_id str Required

Project ID.

name str Required

Scenario name.

description str Required

Scenario description.

rules list[str]

Rules the generated conversations should follow.

Generate a preview conversation for a scenario without persisting it.

project_id str Required
Project ID.
description str Required
Scenario description.
rules list[str]
Scenario rules.
agent_id str | None
Agent ID for preview.

Retrieve a scenario by its ID within a project.

scenario_id str Required
Scenario ID.
project_id str Required
Project ID.

Update an existing scenario’s definition.

scenario_id str Required
Scenario ID.
project_id str Required
Project ID.
name str | None
Updated name.
description str | None
Updated description.
rules list[str] | None
Updated rules.

List all scenarios for a project.

project_id str Required
Project ID.

Delete a scenario from a project.

scenario_id str Required
Scenario ID.
project_id str Required
Project ID.

Launch automated vulnerability scans covering the OWASP LLM Top 10 and additional threat categories. Sub-resources: hub.scans.probes, hub.scans.attempts.

from giskard_hub.types import (
Scan,
ScanCategory,
ScanProbe,
ScanProbeAttempt,
Severity,
ReviewStatus,
)

Launch a new vulnerability scan of an agent.

project_id str Required

Project ID.

agent_id str Required

Agent to scan.

knowledge_base_id str | None

Anchor probes to KB documents for domain-specific attacks.

probe_ids list[str] | None

List of specific LIDAR probe IDs to run in the scan.

tags list[str] | None

Limit scan to specific threat categories (e.g. ["gsk:threat-type='prompt-injection'"]).

Example
scan = hub.scans.create(
project_id=project.id,
agent_id=agent.id,
tags=["gsk:threat-type='prompt-injection'"],
)
scan = hub.helpers.wait_for_completion(scan)
print(f"Grade: {scan.grade}")
hub.helpers.print_metrics(scan)

List all available scan categories and their OWASP mappings.

List all probe results for a completed scan.

scan_id str Required
Scan ID.

Retrieve a scan result by its ID, with optional related resource inclusion.

scan_id str Required
Scan ID.
include list[Literal["agent", "knowledge_base"]] | None
Embed related objects.

List all scan results, optionally filtered by project.

project_id str | None
Project ID to filter by.
include list[Literal["agent", "knowledge_base"]] | None
Embed related objects.

Delete a scan result by its ID.

scan_id str Required
Scan ID.

Delete multiple scan results at once.

scan_ids list[str] Required
IDs of scans to delete.

List all probe definitions available for scanning.

probe_id str Required

Probe ID.

List all adversarial attempts for a specific probe.

probe_id str Required
Probe ID.

Update a probe attempt’s review status, severity, or success flag.

probe_attempt_id str Required
Probe attempt ID.
review_status ReviewStatus | None
Review status: "pending", "ignored", "acknowledged", "corrected".
severity Severity | None
Severity: SAFE (0), MINOR (10), MAJOR (20), CRITICAL (30).
successful bool | None
Whether the attack was successful.

Set up recurring evaluation runs on a daily, weekly, or monthly cadence for continuous quality monitoring.

from giskard_hub.types import ScheduledEvaluation, FrequencyOption
project_id str Required

Project ID.

agent_id str Required

Agent to evaluate.

dataset_id str Required

Dataset to evaluate against.

frequency FrequencyOption Required

"daily", "weekly", or "monthly".

name str Required

Name of the scheduled evaluation.

time str Required

Time of day in HH:MM format (UTC).

day_of_week int | None

Weekly only: 1 (Monday) through 7 (Sunday).

day_of_month int | None

Monthly only: 1 through 28.

tags list[str] | None

Filter test cases by tags.

run_count int | None

Run each test case N times.

List all past evaluation runs generated by this scheduled evaluation.

scheduled_evaluation_id str Required
Scheduled evaluation ID.
include list[Literal["agent", "dataset"]] | None
Embed related resources.

Retrieve a scheduled evaluation by its ID.

scheduled_evaluation_id str Required
Scheduled evaluation ID.
include list[Literal["evaluations"]] | None
Embed recent evaluation runs.

Update a scheduled evaluation’s configuration.

scheduled_evaluation_id str Required
Scheduled evaluation ID.
name str | None
Updated name.
frequency FrequencyOption | None
Updated frequency.
time str | None
Updated time (HH:MM, UTC).
day_of_week int | None
Updated day of week (1—7).
day_of_month int | None
Updated day of month (1—28).
run_count int | None
Updated run count.
last_execution_at str | datetime | None
Updated last execution time.
last_execution_status LastExecutionStatusParam | None
Updated last execution status.
paused bool | None
Updated paused status.

List all scheduled evaluations for a project.

project_id str Required
Project ID.
include list[Literal["evaluations"]] | None
Embed recent runs.
last_days int | None
Filter to schedules active within the last N days.

Delete a scheduled evaluation by its ID.

scheduled_evaluation_id str Required
Scheduled evaluation ID.

Delete multiple scheduled evaluations at once.

scheduled_evaluation_ids list[str] Required
IDs to delete.

Lightweight issue tracker for managing findings from evaluations and scans. Link tasks to specific evaluation results, test cases, or probe attempts.

from giskard_hub.types import Task, TaskStatus, TaskPriority
project_id str Required

Project ID.

description str Required

What needs to be done.

priority TaskPriority | None

"low", "medium", or "high".

status TaskStatus | None

"open", "in_progress", or "resolved".

assignee_ids list[str]

User IDs to assign.

evaluation_result_id str | None

Link to a specific evaluation result.

dataset_test_case_id str | None

Link to a specific test case.

probe_attempt_id str | None

Link to a specific scan probe attempt.

disable_test bool

Disable the linked test case.

hide_result bool

Hide the linked evaluation result.

Retrieve a task by its ID.

task_id str Required
Task ID.

Update an existing task’s metadata and assignees.

task_id str Required
Task ID.
status TaskStatus | None
Updated status: "open", "in_progress", or "resolved".
priority TaskPriority | None
Updated priority: "low", "medium", or "high".
description str | None
Updated description.
assignee_ids list[str] | None
Updated user IDs to assign.
set_test_case_status str | None
Also set the linked test case’s status.

List all tasks for a project, ordered by creation date descending.

project_id str | None
Project ID to filter by.

Delete a task by its ID.

task_id str Required
Task ID.

Delete multiple tasks at once.

task_ids list[str] Required
IDs of tasks to delete.

Create, update, and manage individual test cases within datasets. Sub-resource: hub.test_cases.comments.

from giskard_hub.types import TestCase, TestCaseComment, ChatMessageWithMetadata

Create a new test case with conversation messages and optional checks.

dataset_id str Required
Dataset this test case belongs to.
messages Iterable[ChatMessageParam] Required

Conversation messages as [{"role": "user", "content": "..."}]. Should not include the final assistant response.

checks Iterable[CheckConfigParam]

Checks to apply: [{"identifier": "correctness", "params": {"reference": "..."}}].

demo_output str | ChatMessageWithMetadataParam | None

Expected output for display only — not used during evaluation.

status "active" | "draft" | None
Test case status.
tags list[str]
Tags for filtering.

Retrieve a test case by its ID.

test_case_id str Required
Test case ID.

Update an existing test case’s messages, checks, tags, or status.

test_case_id str Required
Test case ID.
messages Iterable[ChatMessageParam] | None
Updated conversation messages.
checks Iterable[CheckConfigParam] | None
Updated checks.
demo_output str | ChatMessageWithMetadataParam | None
Updated expected output.
status "active" | "draft" | None
Updated status.
tags list[str] | None
Updated tags.
dataset_id str | None
Move the test case to a different dataset.

Delete a test case by its ID.

test_case_id str Required
Test case ID.
test_case_ids list[str] Required

IDs of test cases to delete.

Update multiple test cases at once. Returns the updated test cases.

test_case_ids list[str] Required
Test case IDs.
status Literal["active", "draft"] | None
Updated status.
disabled_checks list[str] | None
Checks to disable.
enabled_checks list[str] | None
Checks to enable.
added_tags list[str] | None
Tags to add.
removed_tags list[str] | None
Tags to remove.

Move or copy test cases to another dataset.

test_case_ids list[str] Required
Test case IDs to move.
target_dataset_id str Required
Target dataset ID.
duplicate bool
Copy instead of move.
test_case_id str Required

Test case ID.

content str Required

Comment text.

comment_id str Required

Comment ID.

test_case_id str Required

Test case ID.

content str Required

Updated text.

comment_id str Required

Comment ID.

test_case_id str Required

Test case ID.


Access conversations captured from the Hub's interactive playground UI.

from giskard_hub.types import PlaygroundChat
project_id str Required

Project ID.

include list[Literal["agent"]] | None

Embed related resources (["agent"]).

limit int | None

Maximum results.

offset int | None

Results offset.

chat_id str Required

Chat ID.

include list[Literal["agent"]] | None

Embed related resources (["agent"]).

Delete a playground chat by its ID.

chat_id str Required
Chat ID.

Delete multiple playground chats at once.

chat_ids list[str] Required
IDs of chats to delete.

Query the audit trail for compliance reporting, change history, and debugging. Every create, update, and delete action is recorded.

from giskard_hub.types import Audit, AuditDisplay

Search audit events with free-text queries, filters, and pagination. Pass include_metadata=True for tuple[list[Audit], APIPaginatedMetadata].

query str | None

Free-text search query.

filters AuditFiltersParam | None

Filter criteria (see filter keys below).

order_by list[AuditOrderByParam] | None

Sorting criteria.

limit int

Maximum results.

offset int

Results offset.

include_metadata bool Default: False

Include pagination metadata. If true, returns a tuple of (results, metadata).

Filter keys:

KeyTypeExample
project_idlist filter{"selected_options": ["project-id"]}
entity_typelist filter{"selected_options": ["agent", "evaluation"]}
actionlist filter{"selected_options": ["create", "delete"]}
user_idlist filter{"selected_options": ["user-id"]}
created_atdate range{"from_": "2025-01-01T00:00:00Z", "to_": "2025-12-31T23:59:59Z"}

List audit history for a specific resource, including diffs of each change. Pass include_metadata=True for pagination metadata.

entity_id str Required
UUID of the entity.
entity_type str Required
Type of entity (e.g. "project", "agent", "evaluation").
limit int
Maximum results.
offset int
Results offset.
include_metadata bool Default: False
Include pagination metadata.

All Python types referenced by the methods above. Click any type name in a method’s return value or parameter to jump straight to its definition. Each card is collapsed by default — expand it to see the fields.

Shared building blocks used by every resource. The *Param variants are TypedDicts used in request bodies.

role str

Sender role: typically "user", "assistant", or "system".

content str

Message text.

role str Required

Sender role: typically "user", "assistant", or "system".

content str Required

Message text.

role str

Sender role.

content str

Message text.

metadata dict[str, object] | None

Arbitrary metadata attached to the message.

role str Required

Sender role.

content str Required

Message text.

metadata dict[str, object] | None

Arbitrary metadata attached to the message.

name str

Header name.

value str

Header value.

name str Required

Header name.

value str Required

Header value.

message str

Error message returned by the agent or runtime.

details dict[str, object] | None

Optional structured error context.

message str Required

Error message returned by the agent or runtime.

details dict[str, object]

Optional structured error context.

id str

Unique identifier.

email str

User email address.

name str | None

Display name, if set.

id str

Unique identifier.

name str

Display name.

state TaskState

Current state.

current int

Items processed so far.

total int

Total items to process.

error str | None

Error message if the task failed.

"running" Task is in progress.
"finished" Task completed successfully.
"error" Task failed.
"canceled" Task was canceled.
"skipped" Task was skipped.
count int

Number of items returned in this page.

offset int

Offset of the first item in this page.

limit int

Maximum page size requested.

total int

Total number of items across all pages.

id str

Unique identifier.

name str

Display name.

description str | None

Human-readable description.

url str

HTTP endpoint URL.

project_id str

Parent project ID.

supported_languages list[str]

Language codes the agent supports.

headers dict[str, str]

HTTP headers sent with every request.

stateful bool

Whether the agent is stateful.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

name str

Display name.

response ChatMessage | None

The agent’s response message.

error ExecutionError | None

Error details if the agent call failed.

metadata dict[str, object] | None

Arbitrary metadata returned by the agent.

response ChatMessageParam | None Required

The agent’s response message.

error ExecutionErrorParam | None

Error details if the agent call failed.

metadata dict[str, object]

Arbitrary metadata returned by the agent.

stateful bool

Whether the agent was detected as stateful.

name str

Agent name (used for local evaluations).

description str | None

Optional description.

name str Required

Agent name.

description str | None

Optional description.

id str

Unique identifier.

built_in bool

Whether this is a built-in check.

identifier str

Reusable identifier string.

name str

Display name.

description str | None

Human-readable description.

project_id str

Parent project ID.

params dict[str, Any]

Check-specific configuration. Shape depends on the check type — see Check type params.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

name str

Check identifier.

display_name str | None

Human-readable name.

status TaskState

Execution status.

passed bool | None

Whether the check passed.

error str | None

Error message if execution failed.

reason str | None

LLM judge’s reasoning (for LLM-based checks).

annotations list[OutputAnnotation] | None

Annotated spans in the agent’s response.

identifier str

Check identifier.

enabled bool | None

Whether the check is enabled.

params dict[str, Any]

Check-specific parameters (without the type discriminator).

identifier str Required

Check identifier to apply.

enabled bool

Whether the check is enabled.

params dict[str, Any]

Check-specific parameters.

text str

The annotated substring.

label str

Label assigned to the span.

start_char_index int

Start position in the response (character offset).

end_char_index int

End position in the response (character offset).

type "output" | "context"

Whether the annotation references the agent’s output or its retrieved context.

json_path str

JSONPath expression to evaluate against the agent’s output metadata.

expected_value bool | float | str

The value the JSONPath should resolve to.

expected_value_type "string" | "number" | "boolean"

Expected primitive type of the resolved value.

alias union

TypeAlias for the union of CorrectnessParamsParam, ConformityParamsParam, GroundednessParamsParam, StringMatchParamsParam, MetadataParamsParam, and SemanticSimilarityParamsParam. See Check type params for the concrete shapes.

id str

Unique identifier.

name str

Display name.

description str | None

Human-readable description.

project_id str

Parent project ID.

status TaskProgress

Async operation status (for generated datasets).

tags list[str]

All tags used across test cases.

state TaskState

Computed from status.state — e.g. "finished", "running".

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

name str

Display name.

dataset_id str

Dataset to subset.

tags list[str] | None

Restrict to test cases matching these tags.

target_type "dataset" | None

Discriminator for criterion unions.

id str

Unique identifier.

dataset_id str

Parent dataset ID.

messages list[ChatMessage]

Conversation messages.

demo_output AgentOutput | None

Expected output (display only — not used during evaluation).

checks list[CheckConfig]

Configured checks.

comments list[TestCaseComment]

Annotations attached to this test case.

tags list[str]

Tags for filtering.

status "active" | "draft"

Test case status.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

id str

Unique identifier.

content str

Comment text.

Author of the comment.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id "created_at" | "id" | "status" | "updated_at" Required

Column to sort by.

desc bool

Sort descending when true.

alias dict

Dict mapping a column name to a filter value. Valid columns: "metrics", "status", "tags".

id str

Unique identifier.

name str

Display name.

The evaluated agent.

The dataset used.

criteria DatasetSubset | None

Subset of the dataset used as evaluation criteria.

project_id str

Parent project ID.

local bool

Whether this is a local evaluation.

metrics list[Metric]

Aggregated pass/fail metrics per check.

failure_categories dict[str, int]

Counts of results per failure category identifier.

tags list[Metric]

Per-tag aggregated metrics.

status TaskProgress

Async operation status.

state TaskState

Computed from status.state"finished", "running", "error".

old_evaluation_id str | None

ID of the previous evaluation this one is based on.

scheduled_evaluation_id str | None

ID of the scheduled evaluation that produced this run.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

name str

Display name.

name str

Check identifier (e.g. "correctness", "global").

display_name str | None

Human-readable name.

passed int | None

Number of test cases that passed.

failed int | None

Number of test cases that failed.

errored int | None

Number of test cases that errored.

total int | None

Total test cases evaluated.

success_rate float | None

Pass rate as a float between 0.0 and 1.0.

id str

Unique identifier.

evaluation_id str

Parent evaluation ID.

The test case.

test_case_exists bool | None

Whether the test case still exists.

state TaskState

Result state: "finished", "running", "error".

results list[CheckResult]

Per-check outcomes.

output AgentOutput | None

The agent’s actual response.

error str | None

Error message if the agent call failed.

failure_category FailureCategoryResult | None

Assigned failure classification.

hidden bool

Whether this result is hidden from the default view.

divergence_warnings list[DivergenceWarning] | None

Divergence warnings detected during multi-turn evaluation.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

turn int

The conversation turn where divergence was detected.

expected str

The expected message content.

actual str

The actual message content received.

identifier str

Stable identifier (e.g. "hallucination").

title str

Display title.

description str

Human-readable description.

identifier str Required

Stable identifier.

title str Required

Display title.

description str Required

Human-readable description.

id str

Unique identifier.

category FailureCategory | None

The assigned failure category.

status TaskState | None

Classification status.

error str | None

Error message if classification failed.

id "failure_category_name" | "id" | "sample_success" | "status" | "visibility" Required

Column to sort by.

desc bool

Sort descending when true.

alias dict

Dict mapping a column name to a filter value. Valid columns: "failure_category_name", "metrics", "sample_success", "status", "tags", "visibility".

id str

Unique identifier.

The scanned agent.

project_id str

Parent project ID.

knowledge_base KnowledgeBaseReference | KnowledgeBase | None

Linked knowledge base, if the scan was grounded.

grade "A" | "B" | "C" | "D" | None

Overall grade.

status TaskProgress

Async operation status.

state TaskState

Computed from status.state.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique probe identifier.

name str

Probe display name.

desc str

Human-readable description.

tags list[str]

Tags applied to this probe.

id str

Unique identifier.

title str

Display title.

description str

Human-readable description.

owasp_id str | None

Mapping to the OWASP LLM Top 10, if applicable.

id str

Unique identifier.

name str

Probe display name.

category str

Probe category.

description str

Human-readable description.

probe_lidar_id str

LIDAR probe identifier.

tags list[str]

Tags applied to this probe.

scan_id str

Parent scan ID.

metrics list[ScanProbeMetric] | None

Aggregated severity counts.

status TaskProgress

Async operation status.

state TaskState

Convenience accessor for status.state.

severity Severity

Severity level.

count int

Number of attempts at this severity.

id str

Unique identifier.

probe_id str

Parent probe ID.

messages list[ChatMessageWithMetadata]

Conversation messages exchanged with the agent.

metadata dict[str, object]

Arbitrary metadata about the attempt.

reason str

Why this attempt was generated.

severity Severity

Severity assigned to the attempt outcome.

review_status ReviewStatus

Reviewer-assigned status.

error ScanProbeAttemptError | None

Error details if the attempt failed to execute.

message str

Error message.

SAFE 0

No vulnerability found.

MINOR 10

Minor issue.

MAJOR 20

Significant issue.

CRITICAL 30

Critical vulnerability.

"pending" Awaiting review.
"ignored" Reviewer dismissed the finding.
"acknowledged" Reviewer acknowledged the finding.
"corrected" The underlying issue has been fixed.
id str

Unique identifier.

id str

Unique identifier.

name str

Display name.

description str | None

Human-readable description.

filename str | None

Original upload filename.

project_id str

Parent project ID.

n_documents int

Number of indexed documents.

topics list[Topic]

Discovered topics.

status TaskProgress

Async indexing status.

state TaskState

Computed from status.state.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

name str

Display name.

id str

Unique identifier.

name str

Topic name.

knowledge_base_id str

Parent knowledge base ID.

document_count int | None

Number of documents in this topic.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

knowledge_base_id str

Parent knowledge base ID.

snippet str

Truncated content snippet.

content str

Computed alias of snippet (the truncated content shown in search results).

topic_id str | None

Topic ID, if classified.

topic_name str | None

Topic display name.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

knowledge_base_id str

Parent knowledge base ID.

content str

Full document content.

topic_id str | None

Topic ID, if classified.

topic_name str | None

Topic display name.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id "created_at" | "updated_at" | "topic_id" Required

Column to sort by.

desc bool

Sort descending when true.

alias dict

Dict mapping a column name to a filter value. Valid columns: "topic_id".

id str

Unique identifier.

name str

Display name.

description str

Human-readable description.

failure_categories list[FailureCategory]

Project-level failure classifications.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

name str

Scenario name.

description str | None

Scenario description.

rules list[str]

Rules the generated conversations should follow.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

conversation list[dict[str, Any]]

Generated preview conversation.

generated_rules list[str] | None

Rules inferred from the scenario description.

"daily" Run every day.
"weekly" Run on a specific day each week.
"monthly" Run on a specific day each month.
alias union

TypeAlias for SuccessExecutionStatus | ErrorExecutionStatus | None.

alias union

TypeAlias for SuccessExecutionStatusParam | ErrorExecutionStatusParam.

evaluation_id str

ID of the evaluation produced by the execution.

status "success"

Always "success".

evaluation_id str Required

ID of the evaluation produced by the execution.

status "success"

Always "success".

error_message str

Description of what went wrong.

status "error"

Always "error".

error_message str Required

Description of what went wrong.

status "error"

Always "error".

id str

Unique identifier.

name str

Display name.

project_id str

Parent project ID.

agent_id str

Agent to evaluate.

dataset_id str

Dataset to evaluate against.

frequency FrequencyOption

"daily", "weekly", or "monthly".

time str

Time of day in HH:MM format (UTC).

day_of_week int | None

Weekly only: 1 (Monday) through 7 (Sunday).

day_of_month int | None

Monthly only: 1 through 28.

tags list[str]

Tags used to filter test cases.

run_count int

Number of times each test case is run per execution.

paused bool

Whether the schedule is currently paused.

last_execution_at datetime | None

Timestamp of the most recent execution.

last_execution_status LastExecutionStatus

Status of the most recent execution.

evaluations list[EvaluationReference]

Evaluation runs produced by this schedule.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

description str

Task description.

status TaskStatus

Current status.

priority TaskPriority | None

Priority level.

project_id str

Parent project ID.

created_by User

User who created the task.

assignees list[User]

Assigned users.

Linked resources (evaluation results, test cases, or probe attempts).

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

"open" Newly created, not yet picked up.
"in_progress" Being worked on.
"resolved" Closed.
"low" Low priority.
"medium" Medium priority.
"high" High priority.
id str

Unique identifier.

project_id str

Parent project ID.

user UserReference | None

The user who started the chat.

agent AgentReference | Agent | None

The agent that responded.

messages list[ChatMessageWithMetadata]

Conversation messages.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

action "insert" | "update" | "delete"

Action performed on the entity.

entity_id str

UUID of the affected entity.

entity_type str

Type of the affected entity (e.g. "agent", "evaluation").

user_id str | None

User who performed the action, if recorded.

project_id str | None

Project the entity belongs to, if applicable.

diff list[AuditDiffItem] | None

Field-level changes for update actions.

metadata dict[str, object] | None

Arbitrary metadata captured with the event.

created_at datetime

When the action occurred.

id str

Unique identifier.

action "insert" | "update" | "delete"

Action performed.

user_id str

User who performed the action.

user_name str | None

User display name.

diffs list[AuditDisplayDiffItem]

Pre-formatted diff items for display.

real_change_count int

Number of fields that actually changed.

summary_fields list[str]

Field names highlighted in the summary.

created_at datetime

When the action occurred.

kind "added" | "removed" | "changed"

Kind of change.

field str

Field path.

old_value Any | None

Previous value (for removed/changed).

new_value Any | None

New value (for added/changed).

kind "added" | "removed" | "changed" | "skip"

Kind of change for display.

scope str

The scope of the change.

root str

Root field name.

label str | None

Display label for the changed field.

before_str str | None

Pre-formatted previous value.

after_str str | None

Pre-formatted new value.

skip_count int | None

Number of skipped items if kind="skip".

id "action" | "created_at" | "entity_type" | "project_id" | "user_id" Required

Column to sort by.

desc bool

Sort descending when true.

alias dict

Dict mapping a column name to a filter value. Valid columns: "action", "created_at", "entity_type", "project_id", "user_id".


All exceptions inherit from HubClientError and are importable from the root package.

from giskard_hub import (
HubClientError, # Base exception for all SDK errors
APIStatusError, # Base for HTTP status errors (has .status_code, .response)
APITimeoutError, # Request timed out
APIConnectionError, # Could not connect to the Hub
BadRequestError, # 400
AuthenticationError, # 401 — invalid or missing API key
PermissionDeniedError, # 403 — insufficient permissions
NotFoundError, # 404 — resource does not exist
ConflictError, # 409 — resource conflict
UnprocessableEntityError, # 422 — validation error
RateLimitError, # 429 — too many requests
InternalServerError, # 500+ — server error
)
Error handling example
from giskard_hub import HubClient, NotFoundError, AuthenticationError
hub = HubClient()
try:
agent = hub.agents.retrieve("nonexistent-id")
except NotFoundError as e:
print(f"Agent not found: {e}")
except AuthenticationError:
print("Check your API key")

Methods that support pagination accept limit and offset. Pass include_metadata=True to get an APIPaginatedMetadata object:

Pagination example
results, metadata = hub.evaluations.results.search(
"evaluation-id", limit=50, offset=0, include_metadata=True,
)
print(f"Page: {metadata.count} of {metadata.total} (offset {metadata.offset})")
Access HTTP headers and status
response = hub.with_raw_response.agents.retrieve("agent-id")
print(response.status_code)
agent = response.parse()
Per-request override
hub.with_options(max_retries=5, timeout=300.0).evaluations.create(...)
Proxy configuration
from giskard_hub import HubClient, DefaultHttpxClient
hub = HubClient(
http_client=DefaultHttpxClient(proxy="http://proxy.example.com:8080"),
)
Enable debug logging
export GISKARD_HUB_LOG=debug

Every method accepts these optional keyword arguments for per-request customization:

extra_headers dict[str, str] | None

Additional HTTP headers for this request.

extra_query dict[str, object] | None

Additional query parameters.

extra_body object | None

Additional JSON body fields.

timeout float | httpx.Timeout | None

Override the default timeout for this request.