Skip to content
GitHubDiscord

Reference


Synchronous client. All resource operations are available as attributes.

Constructor
from giskard_hub import HubClient
hub = HubClient(
api_key="gsk_...", # or set GISKARD_HUB_API_KEY env var
base_url="https://hub.example.com", # or set GISKARD_HUB_BASE_URL env var
)
api_key str | None Default: env GISKARD_HUB_API_KEY

Your Hub API key.

base_url str | httpx.URL | None Default: env GISKARD_HUB_BASE_URL

Base URL of your Hub instance.

auto_add_api_suffix bool Default: True

Automatically append /_api to base_url.

timeout float | httpx.Timeout | None Default: 60.0

Default request timeout in seconds. Pass an httpx.Timeout for fine-grained control over connect, read, and write timeouts.

max_retries int Default: 2

Number of automatic retries on transient errors (connection errors, 5xx responses).

default_headers dict[str, str] | None Default: None

Headers added to every request.

default_query dict[str, object] | None Default: None

Query parameters added to every request.

http_client httpx.Client | None Default: None

Custom httpx.Client instance for proxies, custom transports, or mutual TLS.

Async counterpart with an identical API surface — every method is a coroutine.

from giskard_hub import HubClient
hub = HubClient()
projects = hub.projects.list()

Register, test, and invoke LLM agents. An agent represents your LLM application -- either a remote HTTP endpoint or a local Python callable.

from giskard_hub.types import Agent, AgentDetectStatefulness, AgentOutput, ChatMessage
.create() Agent

Create a new agent with configuration for external API communication.

name str Required

Display name of the agent.

url str Required

HTTP endpoint the Hub calls during evaluations and scans.

project_id str Required

Project this agent belongs to.

supported_languages list[str] Required

Language codes the agent supports (e.g. ["en", "fr"]).

headers dict[str, str]

HTTP headers sent with every request to the agent (e.g. auth tokens). Each header is a {"name": str, "value": str} dict.

description str | None

Human-readable description.

stateful bool | None

Whether the agent is stateful.

Example
agent = hub.agents.create(
project_id=project.id,
name="Support Bot v2",
url="https://my-app.example.com/api/chat",
supported_languages=["en"],
headers={"Authorization": "Bearer <token>"},
description="GPT-4o chatbot with RAG",
)
.retrieve() Agent

Retrieve an agent by its ID.

agent_id str Required

ID of the agent to retrieve.

.update() Agent

Update an existing agent’s configuration. Only the provided fields are modified.

agent_id str Required

ID of the agent to update.

name str | None
Updated display name.
url str | None
Updated endpoint URL.
description str | None
Updated description.
headers dict[str, str] | None
Updated HTTP headers.
supported_languages list[str] | None
Updated language codes.
.list() list[Agent]

List all agents, optionally filtered by project.

project_id str | None
Project ID to filter by.
.delete() None

Delete an agent by its ID.

agent_id str Required
ID of the agent to delete.
.bulk_delete() None

Delete multiple agents at once.

agent_ids list[str] Required
IDs of agents to delete.
.generate_completion() AgentOutput

Call a registered agent with a list of messages and get the response.

agent_id str Required

ID of the agent to call.

messages Iterable[ChatMessageParam] Required

Conversation messages as [{"role": "user", "content": "..."}].

Example
output = hub.agents.generate_completion(
agent.id,
messages=[{"role": "user", "content": "What is your return policy?"}],
)
print(output.response.content)
print(output.metadata)
.test_connection() AgentOutput

Test connectivity to an agent endpoint without persisting the agent.

url str Required
HTTP endpoint URL to test.
headers dict[str, str]
HTTP headers to include in the test request.
.generate_description() str

Auto-generate a description for an agent by observing its behaviour. Returns the generated description.

agent_id str Required
ID of the agent.
.detect_statefulness() AgentDetectStatefulness

Detect whether the agent is stateful by analyzing its behavior.

agent_id str Required
ID of the agent to detect statefulness for.
id str

Unique identifier.

name str

Display name.

description str | None

Human-readable description.

url str

HTTP endpoint URL.

project_id str

Parent project ID.

supported_languages list[str]

Language codes.

headers list[Header]

HTTP headers.

stateful bool

Whether the agent is stateful.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

stateful bool

Whether the agent is stateful.

response ChatMessage

The agent’s response message.

error ExecutionError | None

Error details if the agent call failed.

metadata dict | None

Arbitrary metadata returned by the agent.


Define and manage reusable check criteria for evaluating agent responses. Checks are project-scoped and can be referenced by identifier in any test case.

from giskard_hub.types import Check, CheckResult
.create() Check

Create a custom check in the specified project.

identifier str Required

Unique identifier to reference this check in test cases.

name str Required

Display name.

project_id str Required

Project this check belongs to.

params CheckTypeParam Required

Check configuration (see check type params below).

description str | None

Human-readable description.

Example
check = hub.checks.create(
project_id=project.id,
identifier="tone_professional",
name="Professional tone",
params={"type": "conformity", "rules": ["Use formal language."]},
)
.retrieve() Check
check_id str Required

ID of the check to retrieve.

.update() Check

Update an existing check. Only the provided fields are modified.

check_id str Required
ID of the check to update.
identifier str | None
Updated identifier.
name str | None
Updated name.
params CheckTypeParam | None
Updated check params.
description str | None
Updated description.
.list() list[Check]
project_id str Required

Project ID to list checks for.

filter_builtin bool

When True, include built-in checks in the results.

.delete() None
check_id str Required

ID of the check to delete.

.bulk_delete() None
check_ids list[str] Required

IDs of checks to delete.

The params field accepts one of these shapes:

Typeparams shapeEvaluation method
Correctness{"type": "correctness", "reference": str}LLM judge
Conformity{"type": "conformity", "rules": list[str]}LLM judge
Groundedness{"type": "groundedness", "context": str}LLM judge
Semantic similarity{"type": "semantic_similarity", "reference": str, "threshold": float}Embedding
String match{"type": "string_match", "keyword": str}Rule-based
Metadata{"type": "metadata", "json_path_rules": list[JsonPathRule]}Rule-based

Each JsonPathRule: {"json_path": str, "expected_value": str, "expected_value_type": "string" | "number" | "boolean"}

id str

Unique identifier.

built_in bool

Whether this is a built-in check.

identifier str

Reusable identifier string.

name str

Display name.

description str | None

Human-readable description.

project_id str

Parent project ID.

params dict

Check-specific configuration.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

name str

Check identifier.

display_name str

Human-readable name.

status str

Execution status.

passed bool

Whether the check passed.

error str | None

Error message if execution failed.

reason str | None

LLM judge’s reasoning (for LLM-based checks).

annotations list[OutputAnnotation]

Annotated spans in the agent’s response.


Create datasets, import test cases, and auto-generate test suites from scenarios or knowledge bases.

from giskard_hub.types import Dataset, TestCase, TaskProgress
.create() Dataset

Create a new empty dataset in the specified project.

name str Required
Display name.
project_id str Required
Project this dataset belongs to.
description str | None
Human-readable description.
.upload() Dataset

Import test cases from a file or list of dicts into a dataset.

project_id str Required
Project ID.
data FileTypes | list[dict[str, Any]] | str Required

File path (str or Path), file-like object, or list of dicts. Each record should have a messages list and optional checks list.

dataset_id str | None
Append to an existing dataset instead of creating a new one.
name str | None
Name for the new dataset.
.generate_scenario_based() Dataset

Generate a dataset of test cases from scenario definitions. The dataset’s status will be "running" until generation completes — use hub.helpers.wait_for_completion() to wait.

project_id str Required
Project ID.
agent_id str Required
Agent to generate test cases for.
scenario_id str Required
Scenario template to use.
n_examples int
Number of test cases to generate.
dataset_id str | None
Append to an existing dataset.
dataset_name str | None
Name for the new dataset.
.generate_document_based() Dataset

Generate test cases grounded in knowledge base documents. Async — use hub.helpers.wait_for_completion().

agent_id str Required
Agent to generate test cases for.
knowledge_base_id str Required
Knowledge base to source documents from.
project_id str Required
Project ID.
dataset_name str | None
Name for the new dataset.
description str | None
Dataset description.
n_examples int
Number of test cases to generate.
topic_ids list[str]
Filter to specific KB topics.
.retrieve() Dataset
dataset_id str Required

ID of the dataset to retrieve.

.update() Dataset
dataset_id str Required

ID of the dataset to update.

name str | None

Updated name.

description str | None

Updated description.

status TaskProgress | None

Async operation status.

.list() list[Dataset]
project_id str | None

Project ID to filter by.

.delete() None

Delete a dataset by its ID.

dataset_id str Required
Dataset ID.
.bulk_delete() None

Delete multiple datasets at once.

dataset_ids list[str] Required
IDs of datasets to delete.
.list_tags() list[str]

List all tags used across test cases in a dataset.

dataset_id str Required
Dataset ID.
.list_test_cases() list[TestCase]

List all test cases in a dataset.

dataset_id str Required
Dataset ID.
.search_test_cases() list[TestCase]

Search test cases with filters, sorting, and pagination. Pass include_metadata=True to receive tuple[list[TestCase], APIPaginatedMetadata].

dataset_id str Required
Dataset ID.
query str | None
Free-text search query.
order_by list[TestCaseOrderByParam] | None
Sorting criteria.
filters TestCaseFiltersParam | None
Filter criteria.
limit int | None
Maximum results per page.
offset int | None
Results offset for pagination.
include_metadata bool Default: False
Include pagination metadata in the return value.
id str

Unique identifier.

name str

Display name.

description str | None

Human-readable description.

project_id str

Parent project ID.

status TaskProgress

Async operation status (for generated datasets).

tags list[str]

All tags used across test cases.

state str

Computed from status.state — e.g. "finished", "running".

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.


Run agents against datasets, inspect per-test-case results, and manage the evaluation lifecycle. Sub-resource: hub.evaluations.results.

from giskard_hub.types import Evaluation, Metric, CheckResult
.create() Evaluation

Create and launch a new evaluation of an agent on a dataset.

project_id str Required

Project ID.

agent_id str Required

Agent to evaluate.

dataset_id str | None

Dataset to evaluate against. Provide this or old_evaluation_id, not both.

old_evaluation_id str | None

Reuse a previous evaluation’s dataset.

name str

Evaluation run name.

tags list[str] | None

Filter test cases by tags.

run_count int

Run each test case N times (for consistency testing).

scheduled_evaluation_id str | None

Link to a scheduled evaluation.

Example
evaluation = hub.evaluations.create(
project_id=project.id,
agent_id=agent.id,
dataset_id=dataset.id,
name="v2.1 regression run",
)
evaluation = hub.helpers.wait_for_completion(evaluation)
hub.helpers.print_metrics(evaluation)
.create_local() Evaluation

Create a local evaluation for running agent inference in your own process.

agent_info MinimalAgentParam Required

Agent info as {"name": str, "description": str}.

dataset_id str | None
Dataset to evaluate against.
name str | None
Evaluation name.
tags list[str] | None
Filter test cases by tags.
old_evaluation_id str | None
Reuse a previous evaluation’s dataset.
.run_single() list[CheckResult]

Evaluate a single (input, output) pair against checks without creating a full evaluation.

messages Iterable[ChatMessageParam] Required
Conversation messages.
agent_output AgentOutputParam Required
Agent’s output to evaluate.
checks Iterable[CheckConfigParam] Required
Checks to apply.
project_id str
Project ID.
agent_description str
Description of the agent for context.
.rerun_errored_results() Evaluation

Rerun all errored results without triggering a full re-evaluation.

evaluation_id str Required
Evaluation ID.
.retrieve() Evaluation

Retrieve an evaluation by its ID, with optional related resource inclusion.

evaluation_id str Required
Evaluation ID.
include list[Literal["agent", "dataset"]] | None
Embed the full agent and/or dataset objects instead of references.
.update() Evaluation

Update an evaluation’s name.

evaluation_id str Required
Evaluation ID.
name str Required
New name for the evaluation.
.list() list[Evaluation]

List all evaluations for a project.

project_id str Required
Project ID.
include list[Literal["agent", "dataset"]] | None
Embed related objects.
.delete() None

Delete an evaluation by its ID.

evaluation_id str Required
Evaluation ID.
.bulk_delete() None

Delete multiple evaluations at once.

evaluation_ids list[str] Required
IDs of evaluations to delete.
id str

Unique identifier.

name str | None

Display name.

agent AgentReference | Agent

The evaluated agent.

dataset DatasetReference | Dataset

The dataset used.

criteria list

Check criteria applied.

project_id str

Parent project ID.

local bool

Whether this is a local evaluation.

metrics list[Metric]

Aggregated pass/fail metrics per check.

tags list[str]

Tags used to filter test cases.

failure_categories list[FailureCategory]

Available failure classifications.

status TaskProgress

Async operation status.

state str

Computed: "finished", "running", "error".

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

old_evaluation_id str | None

ID of the previous evaluation that this evaluation is based on.

scheduled_evaluation_id str | None

ID of the scheduled evaluation that this evaluation is based on.

name str

Check identifier (e.g. "correctness", "global").

display_name str

Human-readable name.

passed int

Number of test cases that passed.

failed int

Number of test cases that failed.

errored int

Number of test cases that errored.

total int

Total test cases.

success_rate float

Pass rate as a float between 0.0 and 1.0.

Inspect, filter, update, and rerun individual evaluation results.

from giskard_hub.types import TestCaseEvaluation, FailureCategory
.retrieve() TestCaseEvaluation
result_id str Required

Result ID.

evaluation_id str Required

Evaluation ID.

include list[str] | None

Embed related resources (["test_case"]).

.update() TestCaseEvaluation

Update the failure category of an evaluation result.

result_id str Required
Result ID.
evaluation_id str Required
Evaluation ID.
failure_category FailureCategoryParam | None
Failure classification to assign.
.list() list[TestCaseEvaluation]
evaluation_id str Required

Evaluation ID.

include list[str] | None

Embed related resources (["test_case"]).

.search() list[TestCaseEvaluation]

Search and filter results. Pass include_metadata=True for pagination metadata.

evaluation_id str Required
Evaluation ID.
query str | None
Free-text search query.
filters ResultFiltersParam | None
Filter criteria.
order_by list[ResultOrderByParam] | None
Sorting criteria.
limit int | None
Maximum results.
offset int | None
Results offset.
include list[str] | None
Embed related resources.
include_metadata bool Default: False
Include pagination metadata.
.rerun_test_case() TestCaseEvaluation
result_id str Required

Result ID.

evaluation_id str Required

Evaluation ID.

.submit_local_output() TestCaseEvaluation

Submit locally-generated agent output for evaluation and scoring.

result_id str Required
Result ID.
evaluation_id str Required
Evaluation ID.
agent_output AgentOutputParam | None
Agent output to submit.
error str | None
Error message if the agent call failed.
.update_visibility() TestCaseEvaluation

Show or hide a result from the default view.

result_id str Required
Result ID.
evaluation_id str Required
Evaluation ID.
hidden bool Required
Whether the result should be hidden.
set_test_case_draft bool | None
Also set the linked test case to draft status.
id str

Unique identifier.

evaluation_id str

Parent evaluation ID.

test_case TestCase | TestCaseReference

The test case.

test_case_exists bool

Whether the test case still exists.

state str

Result state: "finished", "running", "error".

results list[CheckResult]

Per-check outcomes.

output AgentOutput | None

The agent’s actual response.

error ExecutionError | None

Error details if the agent call failed.

failure_category FailureCategory | None

Assigned failure classification.

hidden bool

Whether this result is hidden.

divergence_warnings list[DivergenceWarning] | None

List of divergence warnings detected during multi-turn evaluation.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

turn int

The conversation turn where divergence was detected.

expected str

The expected message content.

actual str

The actual message content received.


High-level convenience methods for the most common SDK workflows: waiting for async operations, running evaluations, and printing metrics.

from giskard_hub.types import Evaluation, Scan, ChatMessage, AgentOutput
.wait_for_completion() TStateful

Poll an entity until it leaves its running state. Returns the refreshed entity.

entity TStateful Required

Any stateful entity: Evaluation, Scan, Dataset, KnowledgeBase, ScanProbe, TestCaseEvaluation.

poll_interval float Default: 5.0

Seconds between polling requests.

max_retries int Default: 360

Maximum polling attempts. Default: 30 minutes at 5-second intervals.

running_states Collection[str] Default: {"running"}

States considered as “still processing”.

error_states Collection[str] Default: {"error"}

Terminal error states.

raise_on_error bool Default: True

Raise ValueError if entity enters an error state.

.evaluate() Evaluation

Run an evaluation for a given agent over a dataset. Handles both remote and local agents.

agent str | Agent | Callable Required

Agent ID, Agent object, or a Python callable for local evaluation. Callable signature: (messages: list[ChatMessage]) -> str | ChatMessage | AgentOutput.

dataset str | Dataset Required

Dataset ID or Dataset object.

project str | Project | None

Required when agent is remote (str or Agent). Not required for local callables.

name str | None
Evaluation run name.
tags list[str] | None
Filter test cases by tags.
evaluation = hub.helpers.evaluate(
agent=my_agent, dataset=my_dataset,
project=my_project, name="Remote eval",
)
.print_metrics() None

Print a formatted metrics table to the console for an evaluation or scan.

entity Evaluation | Scan Required

The evaluation or scan to print metrics for.


Create, search, and manage indexed document collections for grounded evaluations, document-based test generation, and knowledge-grounded vulnerability scans.

from giskard_hub.types import (
KnowledgeBase,
KnowledgeBaseDocumentRow,
KnowledgeBaseDocumentDetail,
)
.create() KnowledgeBase

Create a knowledge base and upload documents. Indexing happens asynchronously after creation — use hub.helpers.wait_for_completion().

name str Required

Display name.

project_id str Required

Project this KB belongs to.

data FileTypes | list[dict[str, Any]] | str Required

Documents as a list of dicts, a file path string, or a pathlib.Path (JSON/JSONL format).

description str | None

Human-readable description.

document_column str Default: "text"

Column name for document text.

topic_column str Default: "topic"

Column name for topic label.

Example
kb = hub.knowledge_bases.create(
project_id=project.id,
name="Product Docs",
data=[
{"text": "30-day return policy.", "topic": "Returns"},
{"text": "Free shipping over $50.", "topic": "Shipping"},
],
)
kb = hub.helpers.wait_for_completion(kb)
.search_documents() list[KnowledgeBaseDocumentRow] | tuple[list[KnowledgeBaseDocumentRow], APIPaginatedMetadata]

Semantic search over documents in a knowledge base.

knowledge_base_id str Required
Knowledge base ID.
query str | None
Search query.
filters KnowledgeBaseDocumentFiltersParam | None
Filter criteria.
order_by list[KnowledgeBaseDocumentOrderByParam] | None
Sorting criteria.
limit int | None
Maximum results.
offset int | None
Results offset.
include_metadata bool Default: False
Include pagination metadata. If true, returns a tuple of (results, metadata).
.retrieve_document() KnowledgeBaseDocumentDetail

Retrieve a specific document with its full content.

knowledge_base_id str Required
Knowledge base ID.
document_id str Required
Document ID.
.retrieve() KnowledgeBase

Retrieve a knowledge base by its ID, including its topics.

knowledge_base_id str Required
Knowledge base ID.
.update() KnowledgeBase

Update a knowledge base’s metadata.

knowledge_base_id str Required
Knowledge base ID.
name str | None
Updated name.
description str | None
Updated description.
project_id str | None
Project ID to move the knowledge base to.
status TaskProgress | None
Async operation status.
.list() list[KnowledgeBase]

List all knowledge bases, optionally filtered by project.

project_id str | None
Project ID to filter by.
.delete() None

Delete a knowledge base by its ID.

knowledge_base_id str Required
Knowledge base ID.
.bulk_delete() None

Delete multiple knowledge bases at once.

knowledge_base_ids list[str] Required
IDs of knowledge bases to delete.
id str

Unique identifier.

name str

Display name.

description str | None

Human-readable description.

filename str | None

Original upload filename.

project_id str

Parent project ID.

n_documents int

Number of indexed documents.

status TaskProgress

Async indexing status.

topics list

Discovered topics.

state str

Computed from status.state.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.


Top-level workspace that groups all related resources: agents, datasets, evaluations, scans, and more. Sub-resource: hub.projects.scenarios.

from giskard_hub.types import Project
.create() Project
name str Required

Project name.

description str | None

Project description.

.update() Project
project_id str Required

Project ID.

name str | None

Updated name.

description str | None

Updated description.

failure_categories Iterable[FailureCategoryParam] | None

Project-level failure classifications.

.retrieve() Project

Retrieve a project by its ID.

project_id str Required
Project ID.
.list() list[Project]

List all projects accessible to the current user.

.delete() None

Delete a project by its ID.

project_id str Required
Project ID.
.bulk_delete() None

Delete multiple projects at once.

project_ids list[str] Required
IDs of projects to delete.
id str

Unique identifier.

name str

Display name.

description str | None

Human-readable description.

failure_categories list[FailureCategory]

Project-level failure classifications.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

Reusable persona and behaviour templates for scenario-based dataset generation.

from giskard_hub.types import Scenario, ScenarioPreview
.create() Scenario
project_id str Required

Project ID (positional).

name str Required

Scenario name.

description str Required

Scenario description.

rules list[str]

Rules the generated conversations should follow.

.preview() ScenarioPreview

Generate a preview conversation for a scenario without persisting it.

project_id str Required
Project ID (positional).
description str Required
Scenario description.
rules list[str]
Scenario rules.
agent_id str | None
Agent ID for preview.
.retrieve() Scenario

Retrieve a scenario by its ID within a project.

scenario_id str Required
Scenario ID.
project_id str Required
Project ID.
.update() Scenario

Update an existing scenario’s definition.

scenario_id str Required
Scenario ID.
project_id str Required
Project ID.
name str | None
Updated name.
description str | None
Updated description.
rules list[str] | None
Updated rules.
.list() list[Scenario]

List all scenarios for a project.

project_id str Required
Project ID (positional).
.delete() None

Delete a scenario from a project.

scenario_id str Required
Scenario ID.
project_id str Required
Project ID.

Launch automated vulnerability scans covering the OWASP LLM Top 10 and additional threat categories. Sub-resources: hub.scans.probes, hub.scans.attempts.

from giskard_hub.types import Scan, ScanCategory, ScanProbe, ScanProbeAttempt, Severity, ReviewStatus
.create() Scan

Launch a new vulnerability scan of an agent.

project_id str Required

Project ID.

agent_id str Required

Agent to scan.

knowledge_base_id str | None

Anchor probes to KB documents for domain-specific attacks.

probe_ids list[str] | None

List of specific LIDAR probe IDs to run in the scan.

tags list[str] | None

Limit scan to specific threat categories (e.g. ["gsk:threat-type='prompt-injection'"]).

Example
scan = hub.scans.create(
project_id=project.id,
agent_id=agent.id,
tags=["gsk:threat-type='prompt-injection'"],
)
scan = hub.helpers.wait_for_completion(scan)
print(f"Grade: {scan.grade}")
hub.helpers.print_metrics(scan)
.list_categories() list[ScanCategory]

List all available scan categories and their OWASP mappings.

.list_probes() list[ScanProbe]

List all probe results for a completed scan.

scan_id str Required
Scan ID.
.retrieve() Scan

Retrieve a scan result by its ID, with optional related resource inclusion.

scan_id str Required
Scan ID.
include list[Literal["agent", "knowledge_base"]] | None
Embed related objects.
.list() list[Scan]

List all scan results, optionally filtered by project.

project_id str | None
Project ID to filter by.
include list[Literal["agent", "knowledge_base"]] | None
Embed related objects.
.delete() None

Delete a scan result by its ID.

scan_id str Required
Scan ID.
.bulk_delete() None

Delete multiple scan results at once.

scan_ids list[str] Required
IDs of scans to delete.
id str

Unique identifier.

agent AgentReference | Agent

The scanned agent.

project_id str

Parent project ID.

knowledge_base KnowledgeBase | None

Linked knowledge base.

grade str | None

Overall grade: "A", "B", "C", "D", or None.

status TaskProgress

Async operation status.

state str

Computed from status.state.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

.retrieve() ScanProbe
probe_id str Required

Probe ID.

.list_attempts() list[ScanProbeAttempt]

List all adversarial attempts for a specific probe.

probe_id str Required
Probe ID.
.update() ScanProbeAttempt

Update a probe attempt’s review status, severity, or success flag.

probe_attempt_id str Required
Probe attempt ID.
review_status ReviewStatus | None
Review status: "pending", "ignored", "acknowledged", "corrected".
severity Severity | None
Severity: SAFE (0), MINOR (10), MAJOR (20), CRITICAL (30).
successful bool | None
Whether the attack was successful.

Set up recurring evaluation runs on a daily, weekly, or monthly cadence for continuous quality monitoring.

from giskard_hub.types import ScheduledEvaluation, FrequencyOption
.create() ScheduledEvaluation
project_id str Required

Project ID.

agent_id str Required

Agent to evaluate.

dataset_id str Required

Dataset to evaluate against.

frequency FrequencyOption Required

"daily", "weekly", or "monthly".

name str Required

Name of the scheduled evaluation.

time str Required

Time of day in HH:MM format (UTC).

day_of_week int | None

Weekly only: 1 (Monday) through 7 (Sunday).

day_of_month int | None

Monthly only: 1 through 28.

tags list[str] | None

Filter test cases by tags.

run_count int

Run each test case N times.

.list_evaluations() list[Evaluation]

List all past evaluation runs generated by this scheduled evaluation.

scheduled_evaluation_id str Required
Scheduled evaluation ID.
include list[Literal["agent", "dataset"]] | None
Embed related resources.
.retrieve() ScheduledEvaluation

Retrieve a scheduled evaluation by its ID.

scheduled_evaluation_id str Required
Scheduled evaluation ID.
include list[Literal["evaluations"]] | None
Embed recent evaluation runs.
.update() ScheduledEvaluation

Update a scheduled evaluation’s configuration.

scheduled_evaluation_id str Required
Scheduled evaluation ID.
name str | None
Updated name.
frequency FrequencyOption | None
Updated frequency.
time str | None
Updated time (HH:MM, UTC).
day_of_week int | None
Updated day of week (1—7).
day_of_month int | None
Updated day of month (1—28).
run_count int | None
Updated run count.
last_execution_at str | datetime | None
Updated last execution time.
last_execution_status LastExecutionStatusParam | None
Updated last execution status.
paused bool | None
Updated paused status.
.list() list[ScheduledEvaluation]

List all scheduled evaluations for a project.

project_id str Required
Project ID.
include list[Literal["evaluations"]] | None
Embed recent runs.
last_days int | None
Filter to schedules active within the last N days.
.delete() None

Delete a scheduled evaluation by its ID.

scheduled_evaluation_id str Required
Scheduled evaluation ID.
.bulk_delete() None

Delete multiple scheduled evaluations at once.

scheduled_evaluation_ids list[str] Required
IDs to delete.

Lightweight issue tracker for managing findings from evaluations and scans. Link tasks to specific evaluation results, test cases, or probe attempts.

from giskard_hub.types import Task, TaskStatus, TaskPriority
.create() Task
project_id str Required

Project ID.

description str Required

What needs to be done.

priority TaskPriority | None

"low", "medium", or "high".

status TaskStatus | None

"open", "in_progress", or "resolved".

assignee_ids list[str]

User IDs to assign.

evaluation_result_id str | None

Link to a specific evaluation result.

dataset_test_case_id str | None

Link to a specific test case.

probe_attempt_id str | None

Link to a specific scan probe attempt.

disable_test bool

Disable the linked test case.

hide_result bool

Hide the linked evaluation result.

.retrieve() Task

Retrieve a task by its ID.

task_id str Required
Task ID.
.update() Task

Update an existing task’s metadata and assignees.

task_id str Required
Task ID.
status TaskStatus | None
Updated status: "open", "in_progress", or "resolved".
priority TaskPriority | None
Updated priority: "low", "medium", or "high".
description str | None
Updated description.
assignee_ids list[str] | None
Updated user IDs to assign.
set_test_case_status str | None
Also set the linked test case’s status.
.list() list[Task]

List all tasks for a project, ordered by creation date descending.

project_id str | None
Project ID to filter by.
.delete() None

Delete a task by its ID.

task_id str Required
Task ID.
.bulk_delete() None

Delete multiple tasks at once.

task_ids list[str] Required
IDs of tasks to delete.
id str

Unique identifier.

description str

Task description.

status TaskStatus

"open", "in_progress", or "resolved".

priority TaskPriority

"low", "medium", or "high".

project_id str

Parent project ID.

created_by UserReference

User who created the task.

assignees list[UserReference]

Assigned users.

references dict

Linked resources.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.


Create, update, and manage individual test cases within datasets. Sub-resource: hub.test_cases.comments.

from giskard_hub.types import TestCase, TestCaseComment, ChatMessageWithMetadata
.create() TestCase

Create a new test case with conversation messages and optional checks.

dataset_id str Required
Dataset this test case belongs to.
messages Iterable[ChatMessageParam] Required

Conversation messages as [{"role": "user", "content": "..."}]. Should not include the final assistant response.

checks Iterable[CheckConfigParam]

Checks to apply: [{"identifier": "correctness", "params": {"reference": "..."}}].

demo_output str | ChatMessageWithMetadataParam | None

Expected output for display only — not used during evaluation.

status "active" | "draft" | None
Test case status.
tags list[str]
Tags for filtering.
.retrieve() TestCase

Retrieve a test case by its ID.

test_case_id str Required
Test case ID.
.update() TestCase

Update an existing test case’s messages, checks, tags, or status.

test_case_id str Required
Test case ID.
messages Iterable[ChatMessageParam] | None
Updated conversation messages.
checks Iterable[CheckConfigParam] | None
Updated checks.
demo_output str | ChatMessageWithMetadataParam | None
Updated expected output.
status "active" | "draft" | None
Updated status.
tags list[str] | None
Updated tags.
dataset_id str | None
Move the test case to a different dataset.
.delete() None

Delete a test case by its ID.

test_case_id str Required
Test case ID.
.bulk_delete() None
test_case_ids list[str] Required

IDs of test cases to delete.

.bulk_update() list[TestCase]

Update multiple test cases at once. Returns the updated test cases.

test_case_ids list[str] Required
Test case IDs.
status Literal["active", "draft"] | None
Updated status.
disabled_checks list[str] | None
Checks to disable.
enabled_checks list[str] | None
Checks to enable.
added_tags list[str] | None
Tags to add.
removed_tags list[str] | None
Tags to remove.
.bulk_move() None

Move or copy test cases to another dataset.

test_case_ids list[str] Required
Test case IDs to move.
target_dataset_id str Required
Target dataset ID.
duplicate bool
Copy instead of move.
.add() TestCaseComment
test_case_id str Required

Test case ID.

content str Required

Comment text.

.edit() TestCaseComment
comment_id str Required

Comment ID.

test_case_id str Required

Test case ID.

content str Required

Updated text.

.delete() None
comment_id str Required

Comment ID.

test_case_id str Required

Test case ID.

id str

Unique identifier.

dataset_id str

Parent dataset ID.

messages list[ChatMessage]

Conversation messages.

demo_output ChatMessageWithMetadata | None

Expected output (display only).

checks list[CheckConfig]

Configured checks.

comments list[TestCaseComment]

Annotations.

tags list[str]

Tags for filtering.

status "active" | "draft"

Test case status.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.


Access conversations captured from the Hub's interactive playground UI.

from giskard_hub.types import PlaygroundChat
.list() list[PlaygroundChat]
project_id str Required

Project ID.

include list[Literal["agent"]] | None

Embed related resources (["agent"]).

limit int | None

Maximum results.

offset int | None

Results offset.

.retrieve() PlaygroundChat
chat_id str Required

Chat ID.

include list[Literal["agent"]] | None

Embed related resources (["agent"]).

.delete() None

Delete a playground chat by its ID.

chat_id str Required
Chat ID.
.bulk_delete() None

Delete multiple playground chats at once.

chat_ids list[str] Required
IDs of chats to delete.

Query the audit trail for compliance reporting, change history, and debugging. Every create, update, and delete action is recorded.

from giskard_hub.types import Audit, AuditDisplay
.search() list[Audit] | tuple[list[Audit], APIPaginatedMetadata]

Search audit events with free-text queries, filters, and pagination. Pass include_metadata=True for tuple[list[Audit], APIPaginatedMetadata].

query str | None

Free-text search query.

filters AuditFiltersParam | None

Filter criteria (see filter keys below).

order_by list[AuditOrderByParam] | None

Sorting criteria.

limit int | None

Maximum results.

offset int | None

Results offset.

include_metadata bool Default: False

Include pagination metadata. If true, returns a tuple of (results, metadata).

Filter keys:

KeyTypeExample
project_idlist filter{"selected_options": ["project-id"]}
entity_typelist filter{"selected_options": ["agent", "evaluation"]}
actionlist filter{"selected_options": ["create", "delete"]}
user_idlist filter{"selected_options": ["user-id"]}
created_atdate range{"from_": "2025-01-01T00:00:00Z", "to_": "2025-12-31T23:59:59Z"}
.list_entities() list[AuditDisplay]

List audit history for a specific resource, including diffs of each change. Pass include_metadata=True for pagination metadata.

entity_id str Required
UUID of the entity.
entity_type str Required
Type of entity (e.g. "project", "agent", "evaluation").
limit int
Maximum results.
offset int
Results offset.
include_metadata bool Default: False
Include pagination metadata.

All exceptions inherit from HubClientError and are importable from the root package.

from giskard_hub import (
HubClientError, # Base exception for all SDK errors
APIStatusError, # Base for HTTP status errors (has .status_code, .response)
APITimeoutError, # Request timed out
APIConnectionError, # Could not connect to the Hub
BadRequestError, # 400
AuthenticationError, # 401 — invalid or missing API key
PermissionDeniedError, # 403 — insufficient permissions
NotFoundError, # 404 — resource does not exist
ConflictError, # 409 — resource conflict
UnprocessableEntityError, # 422 — validation error
RateLimitError, # 429 — too many requests
InternalServerError, # 500+ — server error
)
Error handling example
from giskard_hub import HubClient, NotFoundError, AuthenticationError
hub = HubClient()
try:
agent = hub.agents.retrieve("nonexistent-id")
except NotFoundError as e:
print(f"Agent not found: {e}")
except AuthenticationError:
print("Check your API key")

Methods that support pagination accept limit and offset. Pass include_metadata=True to get an APIPaginatedMetadata object:

Pagination example
results, metadata = hub.evaluations.results.search(
"evaluation-id", limit=50, offset=0, include_metadata=True,
)
print(f"Page: {metadata.count} of {metadata.total} (offset {metadata.offset})")
Access HTTP headers and status
response = hub.with_raw_response.agents.retrieve("agent-id")
print(response.status_code)
agent = response.parse()
Per-request override
hub.with_options(max_retries=5, timeout=300.0).evaluations.create(...)
Proxy configuration
from giskard_hub import HubClient, DefaultHttpxClient
hub = HubClient(
http_client=DefaultHttpxClient(proxy="http://proxy.example.com:8080"),
)
Enable debug logging
export GISKARD_HUB_LOG=debug

Every method accepts these optional keyword arguments for per-request customization:

extra_headers dict[str, str] | None

Additional HTTP headers for this request.

extra_query dict[str, object] | None

Additional query parameters.

extra_body object | None

Additional JSON body fields.

timeout float | httpx.Timeout | None

Override the default timeout for this request.