Reference

Client Client classes for interacting with the Giskard Hub API

Resources Resource groups exposed by the client for managing Hub entities

Types Reference for every data type returned or accepted by the SDK

Error types Exception hierarchy and error handling

Advanced patterns Reusable patterns for pagination, retries, and asynchronous workflows

Client

Client classes for interacting with the Giskard Hub API. Two flavours are available with an identical API surface — pick the one that matches your runtime.

Sync
Async

from giskard_hub import HubClient

hub = HubClient()
projects = hub.projects.list()

from giskard_hub import AsyncHubClient
import asyncio


async def main():
    async with AsyncHubClient() as hub:
        projects = await hub.projects.list()


asyncio.run(main())

`HubClient`

Synchronous client. All resource operations are available as attributes.

from giskard_hub import HubClient

hub = HubClient(
    api_key="gsk_...",                  # or set GISKARD_HUB_API_KEY env var
    base_url="https://hub.example.com", # or set GISKARD_HUB_BASE_URL env var
)

api_key str | None Default: env GISKARD_HUB_API_KEY

Your Hub API key.

base_url str | httpx.URL | None Default: env GISKARD_HUB_BASE_URL

Base URL of your Hub instance.

tenant_host str | None Default: env GISKARD_HUB_TENANT_HOST

Hub tenant hostname. When set, the SDK attaches it as X-Forwarded-Host on every request. Only needed when base_url’s host isn’t the tenant host (e.g. internal Docker networking).

auto_add_api_suffix bool Default: True

Automatically append /_api to base_url.

timeout float | httpx.Timeout | None Default: 60.0

Default request timeout in seconds. Pass an httpx.Timeout for fine-grained control over connect, read, and write timeouts.

max_retries int Default: 2

Number of automatic retries on transient errors (connection errors, 5xx responses).

default_headers dict[str, str] | None Default: None

Headers added to every request.

default_query dict[str, object] | None Default: None

Query parameters added to every request.

http_client httpx.Client | None Default: None

Custom httpx.Client instance for proxies, custom transports, or mutual TLS.

`AsyncHubClient`

Async counterpart with an identical API surface — every method is a coroutine. Accepts the same constructor arguments as HubClient, except http_client takes an httpx.AsyncClient instead of an httpx.Client.

from giskard_hub import AsyncHubClient

hub = AsyncHubClient(
    api_key="gsk_...",                  # or set GISKARD_HUB_API_KEY env var
    base_url="https://hub.example.com", # or set GISKARD_HUB_BASE_URL env var
)

api_key str | None Default: env GISKARD_HUB_API_KEY

Your Hub API key.

base_url str | httpx.URL | None Default: env GISKARD_HUB_BASE_URL

Base URL of your Hub instance.

tenant_host str | None Default: env GISKARD_HUB_TENANT_HOST

Hub tenant hostname. When set, the SDK attaches it as X-Forwarded-Host on every request. Only needed when base_url’s host isn’t the tenant host (e.g. internal Docker networking).

auto_add_api_suffix bool Default: True

Automatically append /_api to base_url.

timeout float | httpx.Timeout | None Default: 60.0

Default request timeout in seconds. Pass an httpx.Timeout for fine-grained control over connect, read, and write timeouts.

max_retries int Default: 2

Number of automatic retries on transient errors (connection errors, 5xx responses).

default_headers dict[str, str] | None Default: None

Headers added to every request.

default_query dict[str, object] | None Default: None

Query parameters added to every request.

http_client httpx.AsyncClient | None Default: None

Custom httpx.AsyncClient instance for proxies, custom transports, or mutual TLS.

Resources

Resource groups exposed by the client for managing Hub entities.

`hub.agents`

Register, test, and invoke LLM agents. An agent represents your LLM application -- either a remote HTTP endpoint or a local Python callable.

from giskard_hub.types import (
    Agent,
    AgentDetectStatefulness,
    AgentOutput,
    ChatMessage,
)

.create() → Agent

Create a new agent with configuration for external API communication.

name str Required

Display name of the agent.

url str Required

HTTP endpoint the Hub calls during evaluations and scans.

project_id str Required

Project this agent belongs to.

supported_languages list[str] Required

Language codes the agent supports (e.g. ["en", "fr"]).

headers dict[str, str]

HTTP headers sent with every request to the agent (e.g. auth tokens). Each header is a {"name": str, "value": str} dict.

description str | None

Human-readable description.

stateful bool | None

Whether the agent is stateful.

agent = hub.agents.create(
    project_id=project.id,
    name="Support Bot v2",
    url="https://my-app.example.com/api/chat",
    supported_languages=["en"],
    headers={"Authorization": "Bearer <token>"},
    description="GPT-4o chatbot with RAG",
)

.retrieve() → Agent

Retrieve an agent by its ID.

agent_id str Required

ID of the agent to retrieve.

.update() → Agent

Update an existing agent’s configuration. Only the provided fields are modified.

agent_id str Required

ID of the agent to update.

name str | None

Updated display name.

url str | None

Updated endpoint URL.

description str | None

Updated description.

headers dict[str, str] | None

Updated HTTP headers.

supported_languages list[str] | None

Updated language codes.

.list() → list[Agent]

List all agents, optionally filtered by project.

project_id str | None

Project ID to filter by.

.delete() → None

Delete an agent by its ID.

agent_id str Required

ID of the agent to delete.

.bulk_delete() → None

Delete multiple agents at once.

agent_ids list[str] Required

IDs of agents to delete.

.generate_completion() → AgentOutput

Call a registered agent with a list of messages and get the response.

agent_id str Required

ID of the agent to call.

messages Iterable[ChatMessageParam] Required

Conversation messages as [{"role": "user", "content": "..."}].

output = hub.agents.generate_completion(
    agent.id,
    messages=[{"role": "user", "content": "What is your return policy?"}],
)
print(output.response.content)
print(output.metadata)

.test_connection() → AgentOutput

Test connectivity to an agent endpoint without persisting the agent.

url str Required

HTTP endpoint URL to test.

headers dict[str, str]

HTTP headers to include in the test request.

.generate_description() → str

Auto-generate a description for an agent by observing its behaviour. Returns the generated description.

agent_id str Required

ID of the agent.

.detect_statefulness() → AgentDetectStatefulness

Detect whether the agent is stateful by analyzing its behavior.

agent_id str Required

ID of the agent to detect statefulness for.

`hub.checks`

Define and manage reusable check criteria for evaluating agent responses. Checks are project-scoped and can be referenced by identifier in any test case.

from giskard_hub.types import Check, CheckResult

.create() → Check

Create a custom check in the specified project.

identifier str Required

Unique identifier to reference this check in test cases.

name str Required

Display name.

project_id str Required

Project this check belongs to.

params CheckTypeParam Required

Check configuration (see check type params below).

description str | None

Human-readable description.

check = hub.checks.create(
    project_id=project.id,
    identifier="tone_professional",
    name="Professional tone",
    params={"type": "conformity", "rules": ["Use formal language."]},
)

.retrieve() → Check

check_id str Required

ID of the check to retrieve.

.update() → Check

Update an existing check. Only the provided fields are modified.

check_id str Required

ID of the check to update.

identifier str | None

Updated identifier.

name str | None

Updated name.

params CheckTypeParam | None

Updated check params.

description str | None

Updated description.

.list() → list[Check]

project_id str Required

Project ID to list checks for.

filter_builtin bool

Whether to filter out built-in checks from the results. Default True.

.delete() → None

check_id str Required

ID of the check to delete.

.bulk_delete() → None

check_ids list[str] Required

IDs of checks to delete.

Check type params

The params field accepts one of these shapes:

Type	`params` shape	Evaluation method
Correctness	`{"type": "correctness", "reference": str}`	LLM judge
Conformity	`{"type": "conformity", "rules": list[str]}`	LLM judge
Groundedness	`{"type": "groundedness", "context": str}`	LLM judge
Semantic similarity	`{"type": "semantic_similarity", "reference": str, "threshold": float}`	Embedding
String match	`{"type": "string_match", "keyword": str}`	Rule-based
Metadata	`{"type": "metadata", "json_path_rules": list[JsonPathRule]}`	Rule-based

Each JsonPathRule: {"json_path": str, "expected_value": str, "expected_value_type": "string" | "number" | "boolean"}

`hub.datasets`

Create datasets, import test cases, and auto-generate test suites from scenarios or knowledge bases.

from giskard_hub.types import Dataset, TestCase, TaskProgress

.create() → Dataset

Create a new empty dataset in the specified project.

name str Required

Display name.

project_id str Required

Project this dataset belongs to.

description str | None

Human-readable description.

.upload() → Dataset

Import test cases from a file or list of dicts into a dataset.

project_id str Required

Project ID.

data FileTypes | list[dict[str, Any]] | str Required

File path (str or Path), file-like object, or list of dicts. Each record should have a messages list and optional checks list.

dataset_id str | None

Append to an existing dataset instead of creating a new one.

name str | None

Name for the new dataset.

.generate_scenario_based() → Dataset

Generate a dataset of test cases from scenario definitions. The dataset’s status will be "running" until generation completes — use hub.helpers.wait_for_completion() to wait.

project_id str Required

Project ID.

agent_id str Required

Agent to generate test cases for.

scenario_id str Required

Scenario template to use.

n_examples int

Number of test cases to generate.

dataset_id str | None

Append to an existing dataset.

dataset_name str | None

Name for the new dataset.

.generate_document_based() → Dataset

Generate test cases grounded in knowledge base documents. Async — use hub.helpers.wait_for_completion().

agent_id str Required

Agent to generate test cases for.

knowledge_base_id str Required

Knowledge base to source documents from.

project_id str Required

Project ID.

dataset_name str

Name for the new dataset.

description str | None

Dataset description.

n_examples int

Number of test cases to generate.

topic_ids list[str]

Filter to specific KB topics.

.retrieve() → Dataset

dataset_id str Required

ID of the dataset to retrieve.

.update() → Dataset

dataset_id str Required

ID of the dataset to update.

name str | None

Updated name.

description str | None

Updated description.

status TaskProgress | None

Async operation status.

.list() → list[Dataset]

project_id str | None

Project ID to filter by.

.delete() → None

Delete a dataset by its ID.

dataset_id str Required

Dataset ID.

.bulk_delete() → None

Delete multiple datasets at once.

dataset_ids list[str] Required

IDs of datasets to delete.

.list_tags() → list[str]

List all tags used across test cases in a dataset.

dataset_id str Required

Dataset ID.

.list_test_cases() → list[TestCase]

List all test cases in a dataset.

dataset_id str Required

Dataset ID.

.search_test_cases() → list[TestCase]

Search test cases with filters, sorting, and pagination. Pass include_metadata=True to receive tuple[list[TestCase], APIPaginatedMetadata].

dataset_id str Required

Dataset ID.

query str | None

Free-text search query.

order_by list[TestCaseOrderByParam] | None

Sorting criteria.

filters TestCaseFiltersParam | None

Filter criteria.

limit int | None

Maximum results per page.

offset int | None

Results offset for pagination.

include_metadata bool Default: False

Include pagination metadata in the return value.

`hub.evaluations`

Run agents against datasets, inspect per-test-case results, and manage the evaluation lifecycle. Sub-resource: hub.evaluations.results.

from giskard_hub.types import Evaluation, Metric, CheckResult

.create() → Evaluation

Create and launch a new evaluation of an agent on a dataset.

project_id str Required

Project ID.

agent_id str Required

Agent to evaluate.

dataset_id str | None

Dataset to evaluate against. Provide this or old_evaluation_id, not both.

old_evaluation_id str | None

Reuse a previous evaluation’s dataset.

name str

Evaluation run name.

tags list[str] | None

Filter test cases by tags.

run_count int

Run each test case N times (for consistency testing).

scheduled_evaluation_id str | None

Link to a scheduled evaluation.

evaluation = hub.evaluations.create(
    project_id=project.id,
    agent_id=agent.id,
    dataset_id=dataset.id,
    name="v2.1 regression run",
)
evaluation = hub.helpers.wait_for_completion(evaluation)
hub.helpers.print_metrics(evaluation)

.create_local() → Evaluation

Create a local evaluation for running agent inference in your own process.

agent_info MinimalAgentParam Required

Agent info as {"name": str, "description": str}.

dataset_id str | None

Dataset to evaluate against.

name str | None

Evaluation name.

tags list[str] | None

Filter test cases by tags.

old_evaluation_id str | None

Reuse a previous evaluation’s dataset.

.run_single() → list[CheckResult]

Evaluate a single (input, output) pair against checks without creating a full evaluation.

messages Iterable[ChatMessageParam] Required

Conversation messages.

agent_output AgentOutputParam Required

Agent’s output to evaluate.

checks Iterable[CheckConfigParam] Required

Checks to apply.

project_id str | None

Project ID.

agent_description str

Description of the agent for context.

.rerun_errored_results() → Evaluation

Rerun all errored results without triggering a full re-evaluation.

evaluation_id str Required

Evaluation ID.

.retrieve() → Evaluation

Retrieve an evaluation by its ID, with optional related resource inclusion.

evaluation_id str Required

Evaluation ID.

include list[Literal["agent", "dataset"]] | None

Embed the full agent and/or dataset objects instead of references.

.update() → Evaluation

Update an evaluation’s name.

evaluation_id str Required

Evaluation ID.

name str Required

New name for the evaluation.

.list() → list[Evaluation]

List all evaluations for a project.

project_id str Required

Project ID.

include list[Literal["agent", "dataset"]] | None

Embed related objects.

.delete() → None

Delete an evaluation by its ID.

evaluation_id str Required

Evaluation ID.

.bulk_delete() → None

Delete multiple evaluations at once.

evaluation_ids list[str] Required

IDs of evaluations to delete.

`hub.evaluations.results`

Inspect, filter, update, and rerun individual evaluation results.

from giskard_hub.types import TestCaseEvaluation, FailureCategory

.retrieve() → TestCaseEvaluation

result_id str Required

Result ID.

evaluation_id str Required

Evaluation ID.

include list[Literal["test_case"]] | None

Embed related resources.

.update() → TestCaseEvaluation

Update the failure category of an evaluation result.

result_id str Required

Result ID.

evaluation_id str Required

Evaluation ID.

failure_category FailureCategoryParam | None

Failure classification to assign.

.list() → list[TestCaseEvaluation]

evaluation_id str Required

Evaluation ID.

include list[Literal["test_case"]] | None

Embed related resources.

.search() → list[TestCaseEvaluation] | tuple[list[TestCaseEvaluation], APIPaginatedMetadata]

Search and filter results. Pass include_metadata=True for pagination metadata.

evaluation_id str Required

Evaluation ID.

query str | None

Free-text search query.

filters ResultFiltersParam | None

Filter criteria.

order_by list[ResultOrderByParam] | None

Sorting criteria.

limit int | None

Maximum results.

offset int | None

Results offset.

include list[Literal["test_case"]] | None

Embed related resources.

include_metadata bool Default: False

Include pagination metadata.

.rerun_test_case() → TestCaseEvaluation

result_id str Required

Result ID.

evaluation_id str Required

Evaluation ID.

.submit_local_output() → TestCaseEvaluation

Submit locally-generated agent output for evaluation and scoring.

result_id str Required

Result ID.

evaluation_id str Required

Evaluation ID.

agent_output AgentOutputParam | None

Agent output to submit.

error str | None

Error message if the agent call failed.

.update_visibility() → TestCaseEvaluation

Show or hide a result from the default view.

result_id str Required

Result ID.

evaluation_id str Required

Evaluation ID.

hidden bool Required

Whether the result should be hidden.

set_test_case_draft bool | None

Also set the linked test case to draft status.

`hub.helpers`

High-level convenience methods for the most common SDK workflows: waiting for async operations, running evaluations, and printing metrics.

from giskard_hub.types import Evaluation, Scan, ChatMessage, AgentOutput

.wait_for_completion() → TStateful

Poll an entity until it leaves its running state. Returns the refreshed entity.

entity TStateful Required

Any stateful entity: Evaluation, Scan, Dataset, KnowledgeBase, ScanProbe, TestCaseEvaluation.

poll_interval float Default: 5.0

Seconds between polling requests.

max_retries int Default: 360

Maximum polling attempts. Default: 30 minutes at 5-second intervals.

running_states Collection[str] Default: {"running"}

States considered as “still processing”.

error_states Collection[str] Default: {"error"}

Terminal error states.

raise_on_error bool Default: True

Raise ValueError if entity enters an error state.

.evaluate() → Evaluation

Run an evaluation for a given agent over a dataset. Handles both remote and local agents.

agent str | Agent | Callable Required

Agent ID, Agent object, or a Python callable for local evaluation. Callable signature: (messages: list[ChatMessage]) -> str | ChatMessage | AgentOutput.

dataset str | Dataset Required

Dataset ID or Dataset object.

project str | Project | None

Required when agent is remote (str or Agent). Not required for local callables.

name str | None

Evaluation run name.

tags list[str] | None

Filter test cases by tags.

Remote agent
Local agent

evaluation = hub.helpers.evaluate(
    agent=my_agent, dataset=my_dataset,
    project=my_project, name="Remote eval",
)

def my_fn(messages: list[ChatMessage]) -> str:
    return "Hello from my local agent"

evaluation = hub.helpers.evaluate(
agent=my_fn, dataset="dataset-id", name="Local eval",
)

.print_metrics() → None

Print a formatted metrics table to the console for an evaluation or scan.

entity Evaluation | Scan Required

The evaluation or scan to print metrics for.

`hub.knowledge_bases`

Create, search, and manage indexed document collections for grounded evaluations, document-based test generation, and knowledge-grounded vulnerability scans.

from giskard_hub.types import (
    KnowledgeBase,
    KnowledgeBaseDocumentRow,
    KnowledgeBaseDocumentDetail,
)

.create() → KnowledgeBase

Create a knowledge base and upload documents. Indexing happens asynchronously after creation — use hub.helpers.wait_for_completion().

name str Required

Display name.

project_id str Required

Project this KB belongs to.

data FileTypes | list[dict[str, Any]] | str Required

Documents as a list of dicts, a file path string, or a pathlib.Path (JSON/JSONL format).

description str | None

Human-readable description.

document_column str

Column name for document text. Server defaults to "text" if omitted.

topic_column str

Column name for topic label. Server defaults to "topic" if omitted.

kb = hub.knowledge_bases.create(
    project_id=project.id,
    name="Product Docs",
    data=[
        {"text": "30-day return policy.", "topic": "Returns"},
        {"text": "Free shipping over $50.", "topic": "Shipping"},
    ],
)
kb = hub.helpers.wait_for_completion(kb)

.search_documents() → list[KnowledgeBaseDocumentRow] | tuple[list[KnowledgeBaseDocumentRow], APIPaginatedMetadata]

Semantic search over documents in a knowledge base.

knowledge_base_id str Required

Knowledge base ID.

query str | None

Search query.

filters KnowledgeBaseDocumentFiltersParam | None

Filter criteria.

order_by list[KnowledgeBaseDocumentOrderByParam] | None

Sorting criteria.

limit int | None

Maximum results.

offset int | None

Results offset.

include_metadata bool Default: False

Include pagination metadata. If true, returns a tuple of (results, metadata).

.retrieve_document() → KnowledgeBaseDocumentDetail

Retrieve a specific document with its full content.

knowledge_base_id str Required

Knowledge base ID.

document_id str Required

Document ID.

.retrieve() → KnowledgeBase

Retrieve a knowledge base by its ID, including its topics.

knowledge_base_id str Required

Knowledge base ID.

.update() → KnowledgeBase

Update a knowledge base’s metadata.

knowledge_base_id str Required

Knowledge base ID.

name str | None

Updated name.

description str | None

Updated description.

project_id str | None

Project ID to move the knowledge base to.

status TaskProgress | None

Async operation status.

.list() → list[KnowledgeBase]

List all knowledge bases, optionally filtered by project.

project_id str | None

Project ID to filter by.

.delete() → None

Delete a knowledge base by its ID.

knowledge_base_id str Required

Knowledge base ID.

.bulk_delete() → None

Delete multiple knowledge bases at once.

knowledge_base_ids list[str] Required

IDs of knowledge bases to delete.

`hub.projects`

Top-level workspace that groups all related resources: agents, datasets, evaluations, scans, and more. Sub-resource: hub.projects.scenarios.

from giskard_hub.types import Project

.create() → Project

name str Required

Project name.

description str | None

Project description.

.update() → Project

project_id str Required

Project ID.

name str | None

Updated name.

description str | None

Updated description.

failure_categories Iterable[FailureCategoryParam] | None

Project-level failure classifications.

.retrieve() → Project

Retrieve a project by its ID.

project_id str Required

Project ID.

.list() → list[Project]

List all projects accessible to the current user.

.delete() → None

Delete a project by its ID.

project_id str Required

Project ID.

.bulk_delete() → None

Delete multiple projects at once.

project_ids list[str] Required

IDs of projects to delete.

`hub.projects.scenarios`

Reusable persona and behaviour templates for scenario-based dataset generation.

from giskard_hub.types import Scenario, ScenarioPreview

.create() → Scenario

project_id str Required

Project ID.

name str Required

Scenario name.

description str Required

Scenario description.

rules list[str]

Rules the generated conversations should follow.

.preview() → ScenarioPreview

Generate a preview conversation for a scenario without persisting it.

project_id str Required

Project ID.

description str Required

Scenario description.

rules list[str]

Scenario rules.

agent_id str | None

Agent ID for preview.

.retrieve() → Scenario

Retrieve a scenario by its ID within a project.

scenario_id str Required

Scenario ID.

project_id str Required

Project ID.

.update() → Scenario

Update an existing scenario’s definition.

scenario_id str Required

Scenario ID.

project_id str Required

Project ID.

name str | None

Updated name.

description str | None

Updated description.

rules list[str] | None

Updated rules.

.list() → list[Scenario]

List all scenarios for a project.

project_id str Required

Project ID.

.delete() → None

Delete a scenario from a project.

scenario_id str Required

Scenario ID.

project_id str Required

Project ID.

`hub.scans`

Launch automated vulnerability scans covering the OWASP LLM Top 10 and additional threat categories. Sub-resources: hub.scans.probes, hub.scans.attempts.

from giskard_hub.types import (
    Scan,
    ScanCategory,
    ScanProbe,
    ScanProbeAttempt,
    Severity,
    ReviewStatus,
)

.create() → Scan

Launch a new vulnerability scan of an agent.

project_id str Required

Project ID.

agent_id str Required

Agent to scan.

knowledge_base_id str | None

Anchor probes to KB documents for domain-specific attacks.

probe_ids list[str] | None

List of specific LIDAR probe IDs to run in the scan.

tags list[str] | None

Limit scan to specific threat categories (e.g. ["gsk:threat-type='prompt-injection'"]).

scan = hub.scans.create(
    project_id=project.id,
    agent_id=agent.id,
    tags=["gsk:threat-type='prompt-injection'"],
)
scan = hub.helpers.wait_for_completion(scan)
print(f"Grade: {scan.grade}")
hub.helpers.print_metrics(scan)

.list_categories() → list[ScanCategory]

List all available scan categories and their OWASP mappings.

.list_probes() → list[ScanProbe]

List all probe results for a completed scan.

scan_id str Required

Scan ID.

.retrieve() → Scan

Retrieve a scan result by its ID, with optional related resource inclusion.

scan_id str Required

Scan ID.

include list[Literal["agent", "knowledge_base"]] | None

Embed related objects.

.list() → list[Scan]

List all scan results, optionally filtered by project.

project_id str | None

Project ID to filter by.

include list[Literal["agent", "knowledge_base"]] | None

Embed related objects.

.delete() → None

Delete a scan result by its ID.

scan_id str Required

Scan ID.

.bulk_delete() → None

Delete multiple scan results at once.

scan_ids list[str] Required

IDs of scans to delete.

.list_available_probes() → list[ScanAvailableProbe]

List all probe definitions available for scanning.

`hub.scans.probes`

.retrieve() → ScanProbe

probe_id str Required

Probe ID.

.list_attempts() → list[ScanProbeAttempt]

List all adversarial attempts for a specific probe.

probe_id str Required

Probe ID.

`hub.scans.attempts`

.update() → ScanProbeAttempt

Update a probe attempt’s review status, severity, or success flag.

probe_attempt_id str Required

Probe attempt ID.

review_status ReviewStatus | None

Review status: "pending", "ignored", "acknowledged", "corrected".

severity Severity | None

Severity: SAFE (0), MINOR (10), MAJOR (20), CRITICAL (30).

successful bool | None

Whether the attack was successful.

`hub.scheduled_evaluations`

Set up recurring evaluation runs on a daily, weekly, or monthly cadence for continuous quality monitoring.

from giskard_hub.types import ScheduledEvaluation, FrequencyOption

.create() → ScheduledEvaluation

project_id str Required

Project ID.

agent_id str Required

Agent to evaluate.

dataset_id str Required

Dataset to evaluate against.

frequency FrequencyOption Required

"daily", "weekly", or "monthly".

name str Required

Name of the scheduled evaluation.

time str Required

Time of day in HH:MM format (UTC).

day_of_week int | None

Weekly only: 1 (Monday) through 7 (Sunday).

day_of_month int | None

Monthly only: 1 through 28.

tags list[str] | None

Filter test cases by tags.

run_count int | None

Run each test case N times.

.list_evaluations() → list[Evaluation]

List all past evaluation runs generated by this scheduled evaluation.

scheduled_evaluation_id str Required

Scheduled evaluation ID.

include list[Literal["agent", "dataset"]] | None

Embed related resources.

.retrieve() → ScheduledEvaluation

Retrieve a scheduled evaluation by its ID.

scheduled_evaluation_id str Required

Scheduled evaluation ID.

include list[Literal["evaluations"]] | None

Embed recent evaluation runs.

.update() → ScheduledEvaluation

Update a scheduled evaluation’s configuration.

scheduled_evaluation_id str Required

Scheduled evaluation ID.

name str | None

Updated name.

frequency FrequencyOption | None

Updated frequency.

time str | None

Updated time (HH:MM, UTC).

day_of_week int | None

Updated day of week (1—7).

day_of_month int | None

Updated day of month (1—28).

run_count int | None

Updated run count.

last_execution_at str | datetime | None

Updated last execution time.

last_execution_status LastExecutionStatusParam | None

Updated last execution status.

paused bool | None

Updated paused status.

.list() → list[ScheduledEvaluation]

List all scheduled evaluations for a project.

project_id str Required

Project ID.

include list[Literal["evaluations"]] | None

Embed recent runs.

last_days int | None

Filter to schedules active within the last N days.

.delete() → None

Delete a scheduled evaluation by its ID.

scheduled_evaluation_id str Required

Scheduled evaluation ID.

.bulk_delete() → None

Delete multiple scheduled evaluations at once.

scheduled_evaluation_ids list[str] Required

IDs to delete.

`hub.tasks`

Lightweight issue tracker for managing findings from evaluations and scans. Link tasks to specific evaluation results, test cases, or probe attempts.

from giskard_hub.types import Task, TaskStatus, TaskPriority

.create() → Task

project_id str Required

Project ID.

description str Required

What needs to be done.

priority TaskPriority | None

"low", "medium", or "high".

status TaskStatus | None

"open", "in_progress", or "resolved".

assignee_ids list[str]

User IDs to assign.

evaluation_result_id str | None

Link to a specific evaluation result.

dataset_test_case_id str | None

Link to a specific test case.

probe_attempt_id str | None

Link to a specific scan probe attempt.

disable_test bool

Disable the linked test case.

hide_result bool

Hide the linked evaluation result.

.retrieve() → Task

Retrieve a task by its ID.

task_id str Required

Task ID.

.update() → Task

Update an existing task’s metadata and assignees.

task_id str Required

Task ID.

status TaskStatus | None

Updated status: "open", "in_progress", or "resolved".

priority TaskPriority | None

Updated priority: "low", "medium", or "high".

description str | None

Updated description.

assignee_ids list[str] | None

Updated user IDs to assign.

set_test_case_status str | None

Also set the linked test case’s status.

.list() → list[Task]

List all tasks for a project, ordered by creation date descending.

project_id str | None

Project ID to filter by.

.delete() → None

Delete a task by its ID.

task_id str Required

Task ID.

.bulk_delete() → None

Delete multiple tasks at once.

task_ids list[str] Required

IDs of tasks to delete.

`hub.test_cases`

Create, update, and manage individual test cases within datasets. Sub-resource: hub.test_cases.comments.

from giskard_hub.types import TestCase, TestCaseComment, ChatMessageWithMetadata

.create() → TestCase

Create a new test case with conversation messages and optional checks.

dataset_id str Required

Dataset this test case belongs to.

messages Iterable[ChatMessageParam] Required

Conversation messages as [{"role": "user", "content": "..."}]. Should not include the final assistant response.

checks Iterable[CheckConfigParam]

Checks to apply: [{"identifier": "correctness", "params": {"reference": "..."}}].

demo_output str | ChatMessageWithMetadataParam | None

Expected output for display only — not used during evaluation.

status "active" | "draft" | None

Test case status.

tags list[str]

Tags for filtering.

.retrieve() → TestCase

Retrieve a test case by its ID.

test_case_id str Required

Test case ID.

.update() → TestCase

Update an existing test case’s messages, checks, tags, or status.

test_case_id str Required

Test case ID.

messages Iterable[ChatMessageParam] | None

Updated conversation messages.

checks Iterable[CheckConfigParam] | None

Updated checks.

demo_output str | ChatMessageWithMetadataParam | None

Updated expected output.

status "active" | "draft" | None

Updated status.

tags list[str] | None

Updated tags.

dataset_id str | None

Move the test case to a different dataset.

.delete() → None

Delete a test case by its ID.

test_case_id str Required

Test case ID.

.bulk_delete() → None

test_case_ids list[str] Required

IDs of test cases to delete.

.bulk_update() → list[TestCase]

Update multiple test cases at once. Returns the updated test cases.

test_case_ids list[str] Required

Test case IDs.

status Literal["active", "draft"] | None

Updated status.

disabled_checks list[str] | None

Checks to disable.

enabled_checks list[str] | None

Checks to enable.

added_tags list[str] | None

Tags to add.

removed_tags list[str] | None

Tags to remove.

.bulk_move() → None

Move or copy test cases to another dataset.

test_case_ids list[str] Required

Test case IDs to move.

target_dataset_id str Required

Target dataset ID.

duplicate bool

Copy instead of move.

`hub.test_cases.comments`

.add() → TestCaseComment

test_case_id str Required

Test case ID.

content str Required

Comment text.

.edit() → TestCaseComment

comment_id str Required

Comment ID.

test_case_id str Required

Test case ID.

content str Required

Updated text.

.delete() → None

comment_id str Required

Comment ID.

test_case_id str Required

Test case ID.

`hub.playground_chats`

Access conversations captured from the Hub's interactive playground UI.

from giskard_hub.types import PlaygroundChat

.list() → list[PlaygroundChat]

project_id str Required

Project ID.

include list[Literal["agent"]] | None

Embed related resources (["agent"]).

limit int | None

Maximum results.

offset int | None

Results offset.

.retrieve() → PlaygroundChat

chat_id str Required

Chat ID.

include list[Literal["agent"]] | None

Embed related resources (["agent"]).

.delete() → None

Delete a playground chat by its ID.

chat_id str Required

Chat ID.

.bulk_delete() → None

Delete multiple playground chats at once.

chat_ids list[str] Required

IDs of chats to delete.

`hub.audit_logs`

Query the audit trail for compliance reporting, change history, and debugging. Every create, update, and delete action is recorded.

from giskard_hub.types import Audit, AuditDisplay

.search() → list[Audit] | tuple[list[Audit], APIPaginatedMetadata]

Search audit events with free-text queries, filters, and pagination. Pass include_metadata=True for tuple[list[Audit], APIPaginatedMetadata].

query str | None

Free-text search query.

filters AuditFiltersParam | None

Filter criteria (see filter keys below).

order_by list[AuditOrderByParam] | None

Sorting criteria.

limit int

Maximum results.

offset int

Results offset.

include_metadata bool Default: False

Include pagination metadata. If true, returns a tuple of (results, metadata).

Filter keys:

Key	Type	Example
`project_id`	list filter	`{"selected_options": ["project-id"]}`
`entity_type`	list filter	`{"selected_options": ["agent", "evaluation"]}`
`action`	list filter	`{"selected_options": ["create", "delete"]}`
`user_id`	list filter	`{"selected_options": ["user-id"]}`
`created_at`	date range	`{"from_": "2025-01-01T00:00:00Z", "to_": "2025-12-31T23:59:59Z"}`

.list_entities() → list[AuditDisplay] | tuple[list[AuditDisplay], APIPaginatedMetadata]

List audit history for a specific resource, including diffs of each change. Pass include_metadata=True for pagination metadata.

entity_id str Required

UUID of the entity.

entity_type str Required

Type of entity (e.g. "project", "agent", "evaluation").

limit int

Maximum results.

offset int

Results offset.

include_metadata bool Default: False

Include pagination metadata.

Types

All Python types referenced by the methods above. Click any type name in a method’s return value or parameter to jump straight to its definition. Each card is collapsed by default — expand it to see the fields.

Core types Chat messages, headers, errors, pagination, task progress

Agent types Agent, AgentOutput, AgentReference, MinimalAgent

Check types Check, CheckResult, CheckConfig, OutputAnnotation

Dataset and test case types Dataset, TestCase, TestCaseComment, references

Evaluation types Evaluation, Metric, TestCaseEvaluation, FailureCategory

Scan types Scan, ScanProbe, ScanProbeAttempt, Severity, ReviewStatus

Knowledge base types KnowledgeBase, Topic, document rows

Project and scenario types Project, Scenario, ScenarioPreview

Scheduled evaluation types ScheduledEvaluation, FrequencyOption

Task types Task, TaskStatus, TaskPriority

Playground chat types PlaygroundChat

Audit types Audit, AuditDisplay, AuditDiffItem

Core types

Shared building blocks used by every resource. The *Param variants are TypedDicts used in request bodies.

role str

Sender role: typically "user", "assistant", or "system".

content str

Message text.

role str Required

Sender role: typically "user", "assistant", or "system".

content str Required

Message text.

role str

Sender role.

content str

Message text.

metadata dict[str, object] | None

Arbitrary metadata attached to the message.

role str Required

Sender role.

content str Required

Message text.

metadata dict[str, object] | None

Arbitrary metadata attached to the message.

name str

Header name.

value str

Header value.

name str Required

Header name.

value str Required

Header value.

message str

Error message returned by the agent or runtime.

details dict[str, object] | None

Optional structured error context.

message str Required

Error message returned by the agent or runtime.

details dict[str, object]

Optional structured error context.

id str

Unique identifier.

email str

User email address.

name str | None

Display name, if set.

id str

Unique identifier.

name str

Display name.

state TaskState

Current state.

current int

Items processed so far.

total int

Total items to process.

error str | None

Error message if the task failed.

"running" Task is in progress.

"finished" Task completed successfully.

"error" Task failed.

"canceled" Task was canceled.

"skipped" Task was skipped.

count int

Number of items returned in this page.

offset int

Offset of the first item in this page.

limit int

Maximum page size requested.

total int

Total number of items across all pages.

Agent types

id str

Unique identifier.

name str

Display name.

description str | None

Human-readable description.

url str

HTTP endpoint URL.

project_id str

Parent project ID.

supported_languages list[str]

Language codes the agent supports.

headers dict[str, str]

HTTP headers sent with every request.

stateful bool

Whether the agent is stateful.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

name str

Display name.

response ChatMessage | None

The agent’s response message.

error ExecutionError | None

Error details if the agent call failed.

metadata dict[str, object] | None

Arbitrary metadata returned by the agent.

response ChatMessageParam | None Required

The agent’s response message.

error ExecutionErrorParam | None

Error details if the agent call failed.

metadata dict[str, object]

Arbitrary metadata returned by the agent.

stateful bool

Whether the agent was detected as stateful.

name str

Agent name (used for local evaluations).

description str | None

Optional description.

name str Required

Agent name.

description str | None

Optional description.

Check types

id str

Unique identifier.

built_in bool

Whether this is a built-in check.

identifier str

Reusable identifier string.

name str

Display name.

description str | None

Human-readable description.

project_id str

Parent project ID.

params dict[str, Any]

Check-specific configuration. Shape depends on the check type — see Check type params.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

name str

Check identifier.

display_name str | None

Human-readable name.

status TaskState

Execution status.

passed bool | None

Whether the check passed.

error str | None

Error message if execution failed.

reason str | None

LLM judge’s reasoning (for LLM-based checks).

annotations list[OutputAnnotation] | None

Annotated spans in the agent’s response.

identifier str

Check identifier.

enabled bool | None

Whether the check is enabled.

params dict[str, Any]

Check-specific parameters (without the type discriminator).

identifier str Required

Check identifier to apply.

enabled bool

Whether the check is enabled.

params dict[str, Any]

Check-specific parameters.

text str

The annotated substring.

label str

Label assigned to the span.

start_char_index int

Start position in the response (character offset).

end_char_index int

End position in the response (character offset).

type "output" | "context"

Whether the annotation references the agent’s output or its retrieved context.

json_path str

JSONPath expression to evaluate against the agent’s output metadata.

expected_value bool | float | str

The value the JSONPath should resolve to.

expected_value_type "string" | "number" | "boolean"

Expected primitive type of the resolved value.

alias union

TypeAlias for the union of CorrectnessParamsParam, ConformityParamsParam, GroundednessParamsParam, StringMatchParamsParam, MetadataParamsParam, and SemanticSimilarityParamsParam. See Check type params for the concrete shapes.

Dataset and test case types

id str

Unique identifier.

name str

Display name.

description str | None

Human-readable description.

project_id str

Parent project ID.

status TaskProgress

Async operation status (for generated datasets).

tags list[str]

All tags used across test cases.

state TaskState

Computed from status.state — e.g. "finished", "running".

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

name str

Display name.

dataset_id str

Dataset to subset.

tags list[str] | None

Restrict to test cases matching these tags.

target_type "dataset" | None

Discriminator for criterion unions.

id str

Unique identifier.

dataset_id str

Parent dataset ID.

messages list[ChatMessage]

Conversation messages.

demo_output AgentOutput | None

Expected output (display only — not used during evaluation).

checks list[CheckConfig]

Configured checks.

comments list[TestCaseComment]

Annotations attached to this test case.

tags list[str]

Tags for filtering.

status "active" | "draft"

Test case status.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

id str

Unique identifier.

content str

Comment text.

user UserReference

Author of the comment.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id "created_at" | "id" | "status" | "updated_at" Required

Column to sort by.

desc bool

Sort descending when true.

alias dict

Dict mapping a column name to a filter value. Valid columns: "metrics", "status", "tags".

Evaluation types

id str

Unique identifier.

name str

Display name.

agent AgentReference | MinimalAgent | Agent

The evaluated agent.

dataset Dataset | DatasetReference

The dataset used.

criteria DatasetSubset | None

Subset of the dataset used as evaluation criteria.

project_id str

Parent project ID.

local bool

Whether this is a local evaluation.

metrics list[Metric]

Aggregated pass/fail metrics per check.

failure_categories dict[str, int]

Counts of results per failure category identifier.

tags list[Metric]

Per-tag aggregated metrics.

status TaskProgress

Async operation status.

state TaskState

Computed from status.state — "finished", "running", "error".

old_evaluation_id str | None

ID of the previous evaluation this one is based on.

scheduled_evaluation_id str | None

ID of the scheduled evaluation that produced this run.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

name str

Display name.

name str

Check identifier (e.g. "correctness", "global").

display_name str | None

Human-readable name.

passed int | None

Number of test cases that passed.

failed int | None

Number of test cases that failed.

errored int | None

Number of test cases that errored.

total int | None

Total test cases evaluated.

success_rate float | None

Pass rate as a float between 0.0 and 1.0.

id str

Unique identifier.

evaluation_id str

Parent evaluation ID.

test_case TestCase | TestCaseReference

The test case.

test_case_exists bool | None

Whether the test case still exists.

state TaskState

Result state: "finished", "running", "error".

results list[CheckResult]

Per-check outcomes.

output AgentOutput | None

The agent’s actual response.

error str | None

Error message if the agent call failed.

failure_category FailureCategoryResult | None

Assigned failure classification.

hidden bool

Whether this result is hidden from the default view.

divergence_warnings list[DivergenceWarning] | None

Divergence warnings detected during multi-turn evaluation.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

turn int

The conversation turn where divergence was detected.

expected str

The expected message content.

actual str

The actual message content received.

identifier str

Stable identifier (e.g. "hallucination").

title str

Display title.

description str

Human-readable description.

identifier str Required

Stable identifier.

title str Required

Display title.

description str Required

Human-readable description.

id str

Unique identifier.

category FailureCategory | None

The assigned failure category.

status TaskState | None

Classification status.

error str | None

Error message if classification failed.

id "failure_category_name" | "id" | "sample_success" | "status" | "visibility" Required

Column to sort by.

desc bool

Sort descending when true.

alias dict

Dict mapping a column name to a filter value. Valid columns: "failure_category_name", "metrics", "sample_success", "status", "tags", "visibility".

Scan types

id str

Unique identifier.

agent AgentReference | Agent

The scanned agent.

project_id str

Parent project ID.

knowledge_base KnowledgeBaseReference | KnowledgeBase | None

Linked knowledge base, if the scan was grounded.

grade "A" | "B" | "C" | "D" | None

Overall grade.

status TaskProgress

Async operation status.

state TaskState

Computed from status.state.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique probe identifier.

name str

Probe display name.

desc str

Human-readable description.

tags list[str]

Tags applied to this probe.

id str

Unique identifier.

title str

Display title.

description str

Human-readable description.

owasp_id str | None

Mapping to the OWASP LLM Top 10, if applicable.

id str

Unique identifier.

name str

Probe display name.

category str

Probe category.

description str

Human-readable description.

probe_lidar_id str

LIDAR probe identifier.

tags list[str]

Tags applied to this probe.

scan_id str

Parent scan ID.

metrics list[ScanProbeMetric] | None

Aggregated severity counts.

status TaskProgress

Async operation status.

state TaskState

Convenience accessor for status.state.

severity Severity

Severity level.

count int

Number of attempts at this severity.

id str

Unique identifier.

probe_id str

Parent probe ID.

messages list[ChatMessageWithMetadata]

Conversation messages exchanged with the agent.

metadata dict[str, object]

Arbitrary metadata about the attempt.

reason str

Why this attempt was generated.

severity Severity

Severity assigned to the attempt outcome.

review_status ReviewStatus

Reviewer-assigned status.

error ScanProbeAttemptError | None

Error details if the attempt failed to execute.

message str

Error message.

SAFE 0

No vulnerability found.

MINOR 10

Minor issue.

MAJOR 20

Significant issue.

CRITICAL 30

Critical vulnerability.

"pending" Awaiting review.

"ignored" Reviewer dismissed the finding.

"acknowledged" Reviewer acknowledged the finding.

"corrected" The underlying issue has been fixed.

id str

Unique identifier.

Knowledge base types

id str

Unique identifier.

name str

Display name.

description str | None

Human-readable description.

filename str | None

Original upload filename.

project_id str

Parent project ID.

n_documents int

Number of indexed documents.

topics list[Topic]

Discovered topics.

status TaskProgress

Async indexing status.

state TaskState

Computed from status.state.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

name str

Display name.

id str

Unique identifier.

name str

Topic name.

knowledge_base_id str

Parent knowledge base ID.

document_count int | None

Number of documents in this topic.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

knowledge_base_id str

Parent knowledge base ID.

snippet str

Truncated content snippet.

content str

Computed alias of snippet (the truncated content shown in search results).

topic_id str | None

Topic ID, if classified.

topic_name str | None

Topic display name.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

knowledge_base_id str

Parent knowledge base ID.

content str

Full document content.

topic_id str | None

Topic ID, if classified.

topic_name str | None

Topic display name.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id "created_at" | "updated_at" | "topic_id" Required

Column to sort by.

desc bool

Sort descending when true.

alias dict

Dict mapping a column name to a filter value. Valid columns: "topic_id".

Project and scenario types

id str

Unique identifier.

name str

Display name.

description str

Human-readable description.

failure_categories list[FailureCategory]

Project-level failure classifications.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

id str

Unique identifier.

name str

Scenario name.

description str | None

Scenario description.

rules list[str]

Rules the generated conversations should follow.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

conversation list[dict[str, Any]]

Generated preview conversation.

generated_rules list[str] | None

Rules inferred from the scenario description.

Scheduled evaluation types

"daily" Run every day.

"weekly" Run on a specific day each week.

"monthly" Run on a specific day each month.

alias union

TypeAlias for SuccessExecutionStatus | ErrorExecutionStatus | None.

alias union

TypeAlias for SuccessExecutionStatusParam | ErrorExecutionStatusParam.

evaluation_id str

ID of the evaluation produced by the execution.

status "success"

Always "success".

evaluation_id str Required

ID of the evaluation produced by the execution.

status "success"

Always "success".

error_message str

Description of what went wrong.

status "error"

Always "error".

error_message str Required

Description of what went wrong.

status "error"

Always "error".

id str

Unique identifier.

name str

Display name.

project_id str

Parent project ID.

agent_id str

Agent to evaluate.

dataset_id str

Dataset to evaluate against.

frequency FrequencyOption

"daily", "weekly", or "monthly".

time str

Time of day in HH:MM format (UTC).

day_of_week int | None

Weekly only: 1 (Monday) through 7 (Sunday).

day_of_month int | None

Monthly only: 1 through 28.

tags list[str]

Tags used to filter test cases.

run_count int

Number of times each test case is run per execution.

paused bool

Whether the schedule is currently paused.

last_execution_at datetime | None

Timestamp of the most recent execution.

last_execution_status LastExecutionStatus

Status of the most recent execution.

evaluations list[EvaluationReference]

Evaluation runs produced by this schedule.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

Task types

id str

Unique identifier.

description str

Task description.

status TaskStatus

Current status.

priority TaskPriority | None

Priority level.

project_id str

Parent project ID.

created_by User

User who created the task.

assignees list[User]

Assigned users.

references list[TestCaseEvaluationReference | ScanProbeAttemptReference | TestCaseReference]

Linked resources (evaluation results, test cases, or probe attempts).

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

"open" Newly created, not yet picked up.

"in_progress" Being worked on.

"resolved" Closed.

"low" Low priority.

"medium" Medium priority.

"high" High priority.

Playground chat types

id str

Unique identifier.

project_id str

Parent project ID.

user UserReference | None

The user who started the chat.

agent AgentReference | Agent | None

The agent that responded.

messages list[ChatMessageWithMetadata]

Conversation messages.

created_at datetime

Creation timestamp.

updated_at datetime

Last update timestamp.

Audit types

id str

Unique identifier.

action "insert" | "update" | "delete"

Action performed on the entity.

entity_id str

UUID of the affected entity.

entity_type str

Type of the affected entity (e.g. "agent", "evaluation").

user_id str | None

User who performed the action, if recorded.

project_id str | None

Project the entity belongs to, if applicable.

diff list[AuditDiffItem] | None

Field-level changes for update actions.

metadata dict[str, object] | None

Arbitrary metadata captured with the event.

created_at datetime

When the action occurred.

id str

Unique identifier.

action "insert" | "update" | "delete"

Action performed.

user_id str

User who performed the action.

user_name str | None

User display name.

diffs list[AuditDisplayDiffItem]

Pre-formatted diff items for display.

real_change_count int

Number of fields that actually changed.

summary_fields list[str]

Field names highlighted in the summary.

created_at datetime

When the action occurred.

kind "added" | "removed" | "changed"

Kind of change.

field str

Field path.

old_value Any | None

Previous value (for removed/changed).

new_value Any | None

New value (for added/changed).

kind "added" | "removed" | "changed" | "skip"

Kind of change for display.

scope str

The scope of the change.

root str

Root field name.

label str | None

Display label for the changed field.

before_str str | None

Pre-formatted previous value.

after_str str | None

Pre-formatted new value.

skip_count int | None

Number of skipped items if kind="skip".

id "action" | "created_at" | "entity_type" | "project_id" | "user_id" Required

Column to sort by.

desc bool

Sort descending when true.

alias dict

Dict mapping a column name to a filter value. Valid columns: "action", "created_at", "entity_type", "project_id", "user_id".

Error types

All exceptions inherit from HubClientError and are importable from the root package.

from giskard_hub import (
    HubClientError,  # Base exception for all SDK errors
    APIStatusError,  # Base for HTTP status errors (has .status_code, .response)
    APITimeoutError,  # Request timed out
    APIConnectionError,  # Could not connect to the Hub
    BadRequestError,  # 400
    AuthenticationError,  # 401 — invalid or missing API key
    PermissionDeniedError,  # 403 — insufficient permissions
    NotFoundError,  # 404 — resource does not exist
    ConflictError,  # 409 — resource conflict
    UnprocessableEntityError,  # 422 — validation error
    RateLimitError,  # 429 — too many requests
    InternalServerError,  # 500+ — server error
)

from giskard_hub import HubClient, NotFoundError, AuthenticationError

hub = HubClient()

try:
    agent = hub.agents.retrieve("nonexistent-id")
except NotFoundError as e:
    print(f"Agent not found: {e}")
except AuthenticationError:
    print("Check your API key")

Advanced patterns

Pagination

Methods that support pagination accept limit and offset. Pass include_metadata=True to get an APIPaginatedMetadata object:

results, metadata = hub.evaluations.results.search(
    "evaluation-id", limit=50, offset=0, include_metadata=True,
)
print(f"Page: {metadata.count} of {metadata.total} (offset {metadata.offset})")

Raw response access

response = hub.with_raw_response.agents.retrieve("agent-id")
print(response.status_code)
agent = response.parse()

Retries and timeouts

hub.with_options(max_retries=5, timeout=300.0).evaluations.create(...)

Custom HTTP client

from giskard_hub import HubClient, DefaultHttpxClient

hub = HubClient(
    http_client=DefaultHttpxClient(proxy="http://proxy.example.com:8080"),
)

Debug logging

export GISKARD_HUB_LOG=debug

Common extra parameters

Every method accepts these optional keyword arguments for per-request customization:

extra_headers dict[str, str] | None

Additional HTTP headers for this request.

extra_query dict[str, object] | None

Additional query parameters.

extra_body object | None

Additional JSON body fields.

timeout float | httpx.Timeout | None

Override the default timeout for this request.