Reference
Client
Section titled “Client”HubClient
Section titled “HubClient”Synchronous client. All resource operations are available as attributes.
from giskard_hub import HubClient
hub = HubClient( api_key="gsk_...", # or set GISKARD_HUB_API_KEY env var base_url="https://hub.example.com", # or set GISKARD_HUB_BASE_URL env var)api_key str | None Default: env GISKARD_HUB_API_KEY Your Hub API key.
base_url str | httpx.URL | None Default: env GISKARD_HUB_BASE_URL Base URL of your Hub instance.
auto_add_api_suffix bool Default: True Automatically append /_api to base_url.
timeout float | httpx.Timeout | None Default: 60.0 Default request timeout in seconds. Pass an httpx.Timeout for fine-grained
control over connect, read, and write timeouts.
max_retries int Default: 2 Number of automatic retries on transient errors (connection errors, 5xx responses).
default_headers dict[str, str] | None Default: None Headers added to every request.
default_query dict[str, object] | None Default: None Query parameters added to every request.
http_client httpx.Client | None Default: None Custom httpx.Client instance for proxies, custom transports, or mutual TLS.
AsyncHubClient
Section titled “AsyncHubClient”Async counterpart with an identical API surface — every method is a coroutine.
from giskard_hub import HubClient
hub = HubClient()projects = hub.projects.list()from giskard_hub import AsyncHubClientimport asyncio
async def main(): async with AsyncHubClient() as hub: projects = await hub.projects.list()
asyncio.run(main())hub.agents
Section titled “hub.agents”Register, test, and invoke LLM agents. An agent represents your LLM application -- either a remote HTTP endpoint or a local Python callable.
from giskard_hub.types import Agent, AgentDetectStatefulness, AgentOutput, ChatMessage.create() → Agent Create a new agent with configuration for external API communication.
name str Required Display name of the agent.
url str Required HTTP endpoint the Hub calls during evaluations and scans.
project_id str Required Project this agent belongs to.
supported_languages list[str] Required Language codes the agent supports (e.g. ["en", "fr"]).
headers dict[str, str] HTTP headers sent with every request to the agent (e.g. auth tokens). Each header is a {"name": str, "value": str} dict.
description str | None Human-readable description.
stateful bool | None Whether the agent is stateful.
agent = hub.agents.create( project_id=project.id, name="Support Bot v2", url="https://my-app.example.com/api/chat", supported_languages=["en"], headers={"Authorization": "Bearer <token>"}, description="GPT-4o chatbot with RAG",).retrieve() → Agent Retrieve an agent by its ID.
agent_id str Required ID of the agent to retrieve.
.update() → Agent Update an existing agent’s configuration. Only the provided fields are modified.
agent_id str Required ID of the agent to update.
name str | None url str | None description str | None headers dict[str, str] | None supported_languages list[str] | None .list() → list[Agent] List all agents, optionally filtered by project.
project_id str | None .delete() → None Delete an agent by its ID.
agent_id str Required .bulk_delete() → None Delete multiple agents at once.
agent_ids list[str] Required .generate_completion() → AgentOutput Call a registered agent with a list of messages and get the response.
agent_id str Required ID of the agent to call.
messages Iterable[ChatMessageParam] Required Conversation messages as [{"role": "user", "content": "..."}].
output = hub.agents.generate_completion( agent.id, messages=[{"role": "user", "content": "What is your return policy?"}],)print(output.response.content)print(output.metadata).test_connection() → AgentOutput Test connectivity to an agent endpoint without persisting the agent.
url str Required headers dict[str, str] .generate_description() → str Auto-generate a description for an agent by observing its behaviour. Returns the generated description.
agent_id str Required .detect_statefulness() → AgentDetectStatefulness Detect whether the agent is stateful by analyzing its behavior.
agent_id str Required id str Unique identifier.
name str Display name.
description str | None Human-readable description.
url str HTTP endpoint URL.
project_id str Parent project ID.
supported_languages list[str] Language codes.
headers list[Header] HTTP headers.
stateful bool Whether the agent is stateful.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
stateful bool Whether the agent is stateful.
response ChatMessage The agent’s response message.
error ExecutionError | None Error details if the agent call failed.
metadata dict | None Arbitrary metadata returned by the agent.
hub.checks
Section titled “hub.checks”Define and manage reusable check criteria for evaluating agent responses. Checks are project-scoped and can be referenced by identifier in any test case.
from giskard_hub.types import Check, CheckResult.create() → Check Create a custom check in the specified project.
identifier str Required Unique identifier to reference this check in test cases.
name str Required Display name.
project_id str Required Project this check belongs to.
params CheckTypeParam Required Check configuration (see check type params below).
description str | None Human-readable description.
check = hub.checks.create( project_id=project.id, identifier="tone_professional", name="Professional tone", params={"type": "conformity", "rules": ["Use formal language."]},).retrieve() → Check check_id str Required ID of the check to retrieve.
.update() → Check Update an existing check. Only the provided fields are modified.
check_id str Required identifier str | None name str | None params CheckTypeParam | None description str | None .list() → list[Check] project_id str Required Project ID to list checks for.
filter_builtin bool When True, include built-in checks in the results.
.delete() → None check_id str Required ID of the check to delete.
.bulk_delete() → None check_ids list[str] Required IDs of checks to delete.
Check type params
Section titled “Check type params”The params field accepts one of these shapes:
| Type | params shape | Evaluation method |
|---|---|---|
| Correctness | {"type": "correctness", "reference": str} | LLM judge |
| Conformity | {"type": "conformity", "rules": list[str]} | LLM judge |
| Groundedness | {"type": "groundedness", "context": str} | LLM judge |
| Semantic similarity | {"type": "semantic_similarity", "reference": str, "threshold": float} | Embedding |
| String match | {"type": "string_match", "keyword": str} | Rule-based |
| Metadata | {"type": "metadata", "json_path_rules": list[JsonPathRule]} | Rule-based |
Each JsonPathRule: {"json_path": str, "expected_value": str, "expected_value_type": "string" | "number" | "boolean"}
id str Unique identifier.
built_in bool Whether this is a built-in check.
identifier str Reusable identifier string.
name str Display name.
description str | None Human-readable description.
project_id str Parent project ID.
params dict Check-specific configuration.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
name str Check identifier.
display_name str Human-readable name.
status str Execution status.
passed bool Whether the check passed.
error str | None Error message if execution failed.
reason str | None LLM judge’s reasoning (for LLM-based checks).
annotations list[OutputAnnotation] Annotated spans in the agent’s response.
hub.datasets
Section titled “hub.datasets”Create datasets, import test cases, and auto-generate test suites from scenarios or knowledge bases.
from giskard_hub.types import Dataset, TestCase, TaskProgress.create() → Dataset Create a new empty dataset in the specified project.
name str Required project_id str Required description str | None .upload() → Dataset Import test cases from a file or list of dicts into a dataset.
project_id str Required data FileTypes | list[dict[str, Any]] | str Required File path (str or Path), file-like object, or list of dicts. Each record should have a messages list and optional checks list.
dataset_id str | None name str | None .generate_scenario_based() → Dataset Generate a dataset of test cases from scenario definitions. The dataset’s status will be "running" until generation completes — use hub.helpers.wait_for_completion() to wait.
project_id str Required agent_id str Required scenario_id str Required n_examples int dataset_id str | None dataset_name str | None .generate_document_based() → Dataset Generate test cases grounded in knowledge base documents. Async — use hub.helpers.wait_for_completion().
agent_id str Required knowledge_base_id str Required project_id str Required dataset_name str | None description str | None n_examples int topic_ids list[str] .retrieve() → Dataset dataset_id str Required ID of the dataset to retrieve.
.update() → Dataset dataset_id str Required ID of the dataset to update.
name str | None Updated name.
description str | None Updated description.
status TaskProgress | None Async operation status.
.list() → list[Dataset] project_id str | None Project ID to filter by.
.delete() → None Delete a dataset by its ID.
dataset_id str Required .bulk_delete() → None Delete multiple datasets at once.
dataset_ids list[str] Required .list_tags() → list[str] List all tags used across test cases in a dataset.
dataset_id str Required .list_test_cases() → list[TestCase] List all test cases in a dataset.
dataset_id str Required .search_test_cases() → list[TestCase] Search test cases with filters, sorting, and pagination. Pass include_metadata=True to receive tuple[list[TestCase], APIPaginatedMetadata].
dataset_id str Required query str | None order_by list[TestCaseOrderByParam] | None filters TestCaseFiltersParam | None limit int | None offset int | None include_metadata bool Default: False id str Unique identifier.
name str Display name.
description str | None Human-readable description.
project_id str Parent project ID.
status TaskProgress Async operation status (for generated datasets).
tags list[str] All tags used across test cases.
state str Computed from status.state — e.g. "finished", "running".
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
hub.evaluations
Section titled “hub.evaluations”Run agents against datasets, inspect per-test-case results, and manage the evaluation lifecycle. Sub-resource: hub.evaluations.results.
from giskard_hub.types import Evaluation, Metric, CheckResult.create() → Evaluation Create and launch a new evaluation of an agent on a dataset.
project_id str Required Project ID.
agent_id str Required Agent to evaluate.
dataset_id str | None Dataset to evaluate against. Provide this or old_evaluation_id, not
both.
old_evaluation_id str | None Reuse a previous evaluation’s dataset.
name str Evaluation run name.
tags list[str] | None Filter test cases by tags.
run_count int Run each test case N times (for consistency testing).
scheduled_evaluation_id str | None Link to a scheduled evaluation.
evaluation = hub.evaluations.create( project_id=project.id, agent_id=agent.id, dataset_id=dataset.id, name="v2.1 regression run",)evaluation = hub.helpers.wait_for_completion(evaluation)hub.helpers.print_metrics(evaluation).create_local() → Evaluation Create a local evaluation for running agent inference in your own process.
agent_info MinimalAgentParam Required Agent info as {"name": str, "description": str}.
dataset_id str | None name str | None tags list[str] | None old_evaluation_id str | None .run_single() → list[CheckResult] Evaluate a single (input, output) pair against checks without creating a full evaluation.
messages Iterable[ChatMessageParam] Required agent_output AgentOutputParam Required checks Iterable[CheckConfigParam] Required project_id str agent_description str .rerun_errored_results() → Evaluation Rerun all errored results without triggering a full re-evaluation.
evaluation_id str Required .retrieve() → Evaluation Retrieve an evaluation by its ID, with optional related resource inclusion.
evaluation_id str Required include list[Literal["agent", "dataset"]] | None .update() → Evaluation Update an evaluation’s name.
evaluation_id str Required name str Required .list() → list[Evaluation] List all evaluations for a project.
project_id str Required include list[Literal["agent", "dataset"]] | None .delete() → None Delete an evaluation by its ID.
evaluation_id str Required .bulk_delete() → None Delete multiple evaluations at once.
evaluation_ids list[str] Required id str Unique identifier.
name str | None Display name.
agent AgentReference | Agent The evaluated agent.
dataset DatasetReference | Dataset The dataset used.
criteria list Check criteria applied.
project_id str Parent project ID.
local bool Whether this is a local evaluation.
metrics list[Metric] Aggregated pass/fail metrics per check.
tags list[str] Tags used to filter test cases.
failure_categories list[FailureCategory] Available failure classifications.
status TaskProgress Async operation status.
state str Computed: "finished", "running", "error".
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
old_evaluation_id str | None ID of the previous evaluation that this evaluation is based on.
scheduled_evaluation_id str | None ID of the scheduled evaluation that this evaluation is based on.
name str Check identifier (e.g. "correctness", "global").
display_name str Human-readable name.
passed int Number of test cases that passed.
failed int Number of test cases that failed.
errored int Number of test cases that errored.
total int Total test cases.
success_rate float Pass rate as a float between 0.0 and 1.0.
hub.evaluations.results
Section titled “hub.evaluations.results”Inspect, filter, update, and rerun individual evaluation results.
from giskard_hub.types import TestCaseEvaluation, FailureCategory.retrieve() → TestCaseEvaluation result_id str Required Result ID.
evaluation_id str Required Evaluation ID.
include list[str] | None Embed related resources (["test_case"]).
.update() → TestCaseEvaluation Update the failure category of an evaluation result.
result_id str Required evaluation_id str Required failure_category FailureCategoryParam | None .list() → list[TestCaseEvaluation] evaluation_id str Required Evaluation ID.
include list[str] | None Embed related resources (["test_case"]).
.search() → list[TestCaseEvaluation] Search and filter results. Pass include_metadata=True for pagination metadata.
evaluation_id str Required query str | None filters ResultFiltersParam | None order_by list[ResultOrderByParam] | None limit int | None offset int | None include list[str] | None include_metadata bool Default: False .rerun_test_case() → TestCaseEvaluation result_id str Required Result ID.
evaluation_id str Required Evaluation ID.
.submit_local_output() → TestCaseEvaluation Submit locally-generated agent output for evaluation and scoring.
result_id str Required evaluation_id str Required agent_output AgentOutputParam | None error str | None .update_visibility() → TestCaseEvaluation Show or hide a result from the default view.
result_id str Required evaluation_id str Required hidden bool Required set_test_case_draft bool | None id str Unique identifier.
evaluation_id str Parent evaluation ID.
test_case TestCase | TestCaseReference The test case.
test_case_exists bool Whether the test case still exists.
state str Result state: "finished", "running", "error".
results list[CheckResult] Per-check outcomes.
output AgentOutput | None The agent’s actual response.
error ExecutionError | None Error details if the agent call failed.
failure_category FailureCategory | None Assigned failure classification.
hidden bool Whether this result is hidden.
divergence_warnings list[DivergenceWarning] | None List of divergence warnings detected during multi-turn evaluation.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
turn int The conversation turn where divergence was detected.
expected str The expected message content.
actual str The actual message content received.
hub.helpers
Section titled “hub.helpers”High-level convenience methods for the most common SDK workflows: waiting for async operations, running evaluations, and printing metrics.
from giskard_hub.types import Evaluation, Scan, ChatMessage, AgentOutput.wait_for_completion() → TStateful Poll an entity until it leaves its running state. Returns the refreshed entity.
entity TStateful Required Any stateful entity: Evaluation, Scan, Dataset, KnowledgeBase,
ScanProbe, TestCaseEvaluation.
poll_interval float Default: 5.0 Seconds between polling requests.
max_retries int Default: 360 Maximum polling attempts. Default: 30 minutes at 5-second intervals.
running_states Collection[str] Default: {"running"} States considered as “still processing”.
error_states Collection[str] Default: {"error"} Terminal error states.
raise_on_error bool Default: True Raise ValueError if entity enters an error state.
.evaluate() → Evaluation Run an evaluation for a given agent over a dataset. Handles both remote and local agents.
agent str | Agent | Callable Required Agent ID, Agent object, or a Python callable for local evaluation. Callable signature: (messages: list[ChatMessage]) -> str | ChatMessage | AgentOutput.
dataset str | Dataset Required Dataset ID or Dataset object.
project str | Project | None Required when agent is remote (str or Agent). Not required for local callables.
name str | None tags list[str] | None evaluation = hub.helpers.evaluate( agent=my_agent, dataset=my_dataset, project=my_project, name="Remote eval",)def my_fn(messages: list[ChatMessage]) -> str: return "Hello from my local agent"
evaluation = hub.helpers.evaluate(agent=my_fn, dataset="dataset-id", name="Local eval",).print_metrics() → None Print a formatted metrics table to the console for an evaluation or scan.
entity Evaluation | Scan Required The evaluation or scan to print metrics for.
hub.knowledge_bases
Section titled “hub.knowledge_bases”Create, search, and manage indexed document collections for grounded evaluations, document-based test generation, and knowledge-grounded vulnerability scans.
from giskard_hub.types import ( KnowledgeBase, KnowledgeBaseDocumentRow, KnowledgeBaseDocumentDetail,).create() → KnowledgeBase Create a knowledge base and upload documents. Indexing happens asynchronously after creation — use hub.helpers.wait_for_completion().
name str Required Display name.
project_id str Required Project this KB belongs to.
data FileTypes | list[dict[str, Any]] | str Required Documents as a list of dicts, a file path string, or a pathlib.Path
(JSON/JSONL format).
description str | None Human-readable description.
document_column str Default: "text" Column name for document text.
topic_column str Default: "topic" Column name for topic label.
kb = hub.knowledge_bases.create( project_id=project.id, name="Product Docs", data=[ {"text": "30-day return policy.", "topic": "Returns"}, {"text": "Free shipping over $50.", "topic": "Shipping"}, ],)kb = hub.helpers.wait_for_completion(kb).search_documents() → list[KnowledgeBaseDocumentRow] | tuple[list[KnowledgeBaseDocumentRow], APIPaginatedMetadata] Semantic search over documents in a knowledge base.
knowledge_base_id str Required query str | None filters KnowledgeBaseDocumentFiltersParam | None order_by list[KnowledgeBaseDocumentOrderByParam] | None limit int | None offset int | None include_metadata bool Default: False .retrieve_document() → KnowledgeBaseDocumentDetail Retrieve a specific document with its full content.
knowledge_base_id str Required document_id str Required .retrieve() → KnowledgeBase Retrieve a knowledge base by its ID, including its topics.
knowledge_base_id str Required .update() → KnowledgeBase Update a knowledge base’s metadata.
knowledge_base_id str Required name str | None description str | None project_id str | None status TaskProgress | None .list() → list[KnowledgeBase] List all knowledge bases, optionally filtered by project.
project_id str | None .delete() → None Delete a knowledge base by its ID.
knowledge_base_id str Required .bulk_delete() → None Delete multiple knowledge bases at once.
knowledge_base_ids list[str] Required id str Unique identifier.
name str Display name.
description str | None Human-readable description.
filename str | None Original upload filename.
project_id str Parent project ID.
n_documents int Number of indexed documents.
status TaskProgress Async indexing status.
topics list Discovered topics.
state str Computed from status.state.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
hub.projects
Section titled “hub.projects”Top-level workspace that groups all related resources: agents, datasets, evaluations, scans, and more. Sub-resource: hub.projects.scenarios.
from giskard_hub.types import Project.create() → Project name str Required Project name.
description str | None Project description.
.update() → Project project_id str Required Project ID.
name str | None Updated name.
description str | None Updated description.
failure_categories Iterable[FailureCategoryParam] | None Project-level failure classifications.
.retrieve() → Project Retrieve a project by its ID.
project_id str Required .list() → list[Project] List all projects accessible to the current user.
.delete() → None Delete a project by its ID.
project_id str Required .bulk_delete() → None Delete multiple projects at once.
project_ids list[str] Required id str Unique identifier.
name str Display name.
description str | None Human-readable description.
failure_categories list[FailureCategory] Project-level failure classifications.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
hub.projects.scenarios
Section titled “hub.projects.scenarios”Reusable persona and behaviour templates for scenario-based dataset generation.
from giskard_hub.types import Scenario, ScenarioPreview.create() → Scenario project_id str Required Project ID (positional).
name str Required Scenario name.
description str Required Scenario description.
rules list[str] Rules the generated conversations should follow.
.preview() → ScenarioPreview Generate a preview conversation for a scenario without persisting it.
project_id str Required description str Required rules list[str] agent_id str | None .retrieve() → Scenario Retrieve a scenario by its ID within a project.
scenario_id str Required project_id str Required .update() → Scenario Update an existing scenario’s definition.
scenario_id str Required project_id str Required name str | None description str | None rules list[str] | None .list() → list[Scenario] List all scenarios for a project.
project_id str Required .delete() → None Delete a scenario from a project.
scenario_id str Required project_id str Required hub.scans
Section titled “hub.scans”Launch automated vulnerability scans covering the OWASP LLM Top 10 and additional threat categories. Sub-resources: hub.scans.probes, hub.scans.attempts.
from giskard_hub.types import Scan, ScanCategory, ScanProbe, ScanProbeAttempt, Severity, ReviewStatus.create() → Scan Launch a new vulnerability scan of an agent.
project_id str Required Project ID.
agent_id str Required Agent to scan.
knowledge_base_id str | None Anchor probes to KB documents for domain-specific attacks.
probe_ids list[str] | None List of specific LIDAR probe IDs to run in the scan.
tags list[str] | None Limit scan to specific threat categories (e.g.
["gsk:threat-type='prompt-injection'"]).
scan = hub.scans.create( project_id=project.id, agent_id=agent.id, tags=["gsk:threat-type='prompt-injection'"],)scan = hub.helpers.wait_for_completion(scan)print(f"Grade: {scan.grade}")hub.helpers.print_metrics(scan).list_categories() → list[ScanCategory] List all available scan categories and their OWASP mappings.
.list_probes() → list[ScanProbe] List all probe results for a completed scan.
scan_id str Required .retrieve() → Scan Retrieve a scan result by its ID, with optional related resource inclusion.
scan_id str Required include list[Literal["agent", "knowledge_base"]] | None .list() → list[Scan] List all scan results, optionally filtered by project.
project_id str | None include list[Literal["agent", "knowledge_base"]] | None .delete() → None Delete a scan result by its ID.
scan_id str Required .bulk_delete() → None Delete multiple scan results at once.
scan_ids list[str] Required id str Unique identifier.
agent AgentReference | Agent The scanned agent.
project_id str Parent project ID.
knowledge_base KnowledgeBase | None Linked knowledge base.
grade str | None Overall grade: "A", "B", "C", "D", or None.
status TaskProgress Async operation status.
state str Computed from status.state.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
hub.scans.probes
Section titled “hub.scans.probes”.retrieve() → ScanProbe probe_id str Required Probe ID.
.list_attempts() → list[ScanProbeAttempt] List all adversarial attempts for a specific probe.
probe_id str Required hub.scans.attempts
Section titled “hub.scans.attempts”.update() → ScanProbeAttempt Update a probe attempt’s review status, severity, or success flag.
probe_attempt_id str Required review_status ReviewStatus | None "pending", "ignored", "acknowledged", "corrected". severity Severity | None SAFE (0), MINOR (10), MAJOR (20), CRITICAL (30). successful bool | None hub.scheduled_evaluations
Section titled “hub.scheduled_evaluations”Set up recurring evaluation runs on a daily, weekly, or monthly cadence for continuous quality monitoring.
from giskard_hub.types import ScheduledEvaluation, FrequencyOption.create() → ScheduledEvaluation project_id str Required Project ID.
agent_id str Required Agent to evaluate.
dataset_id str Required Dataset to evaluate against.
frequency FrequencyOption Required "daily", "weekly", or "monthly".
name str Required Name of the scheduled evaluation.
time str Required Time of day in HH:MM format (UTC).
day_of_week int | None Weekly only: 1 (Monday) through 7 (Sunday).
day_of_month int | None Monthly only: 1 through 28.
tags list[str] | None Filter test cases by tags.
run_count int Run each test case N times.
.list_evaluations() → list[Evaluation] List all past evaluation runs generated by this scheduled evaluation.
scheduled_evaluation_id str Required include list[Literal["agent", "dataset"]] | None .retrieve() → ScheduledEvaluation Retrieve a scheduled evaluation by its ID.
scheduled_evaluation_id str Required include list[Literal["evaluations"]] | None .update() → ScheduledEvaluation Update a scheduled evaluation’s configuration.
scheduled_evaluation_id str Required name str | None frequency FrequencyOption | None time str | None day_of_week int | None day_of_month int | None run_count int | None last_execution_at str | datetime | None last_execution_status LastExecutionStatusParam | None paused bool | None .list() → list[ScheduledEvaluation] List all scheduled evaluations for a project.
project_id str Required include list[Literal["evaluations"]] | None last_days int | None .delete() → None Delete a scheduled evaluation by its ID.
scheduled_evaluation_id str Required .bulk_delete() → None Delete multiple scheduled evaluations at once.
scheduled_evaluation_ids list[str] Required hub.tasks
Section titled “hub.tasks”Lightweight issue tracker for managing findings from evaluations and scans. Link tasks to specific evaluation results, test cases, or probe attempts.
from giskard_hub.types import Task, TaskStatus, TaskPriority.create() → Task project_id str Required Project ID.
description str Required What needs to be done.
priority TaskPriority | None "low", "medium", or "high".
status TaskStatus | None "open", "in_progress", or "resolved".
assignee_ids list[str] User IDs to assign.
evaluation_result_id str | None Link to a specific evaluation result.
dataset_test_case_id str | None Link to a specific test case.
probe_attempt_id str | None Link to a specific scan probe attempt.
disable_test bool Disable the linked test case.
hide_result bool Hide the linked evaluation result.
.retrieve() → Task Retrieve a task by its ID.
task_id str Required .update() → Task Update an existing task’s metadata and assignees.
task_id str Required status TaskStatus | None "open", "in_progress", or "resolved". priority TaskPriority | None "low", "medium", or "high". description str | None assignee_ids list[str] | None set_test_case_status str | None .list() → list[Task] List all tasks for a project, ordered by creation date descending.
project_id str | None .delete() → None Delete a task by its ID.
task_id str Required .bulk_delete() → None Delete multiple tasks at once.
task_ids list[str] Required id str Unique identifier.
description str Task description.
status TaskStatus "open", "in_progress", or "resolved".
priority TaskPriority "low", "medium", or "high".
project_id str Parent project ID.
created_by UserReference User who created the task.
assignees list[UserReference] Assigned users.
references dict Linked resources.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
hub.test_cases
Section titled “hub.test_cases”Create, update, and manage individual test cases within datasets. Sub-resource: hub.test_cases.comments.
from giskard_hub.types import TestCase, TestCaseComment, ChatMessageWithMetadata.create() → TestCase Create a new test case with conversation messages and optional checks.
dataset_id str Required messages Iterable[ChatMessageParam] Required Conversation messages as [{"role": "user", "content": "..."}]. Should not include the final assistant response.
checks Iterable[CheckConfigParam] Checks to apply: [{"identifier": "correctness", "params": {"reference": "..."}}].
demo_output str | ChatMessageWithMetadataParam | None Expected output for display only — not used during evaluation.
status "active" | "draft" | None tags list[str] .retrieve() → TestCase Retrieve a test case by its ID.
test_case_id str Required .update() → TestCase Update an existing test case’s messages, checks, tags, or status.
test_case_id str Required messages Iterable[ChatMessageParam] | None checks Iterable[CheckConfigParam] | None demo_output str | ChatMessageWithMetadataParam | None status "active" | "draft" | None tags list[str] | None dataset_id str | None .delete() → None Delete a test case by its ID.
test_case_id str Required .bulk_delete() → None test_case_ids list[str] Required IDs of test cases to delete.
.bulk_update() → list[TestCase] Update multiple test cases at once. Returns the updated test cases.
test_case_ids list[str] Required status Literal["active", "draft"] | None disabled_checks list[str] | None enabled_checks list[str] | None added_tags list[str] | None removed_tags list[str] | None .bulk_move() → None Move or copy test cases to another dataset.
test_case_ids list[str] Required target_dataset_id str Required duplicate bool hub.test_cases.comments
Section titled “hub.test_cases.comments”.add() → TestCaseComment test_case_id str Required Test case ID.
content str Required Comment text.
.edit() → TestCaseComment comment_id str Required Comment ID.
test_case_id str Required Test case ID.
content str Required Updated text.
.delete() → None comment_id str Required Comment ID.
test_case_id str Required Test case ID.
id str Unique identifier.
dataset_id str Parent dataset ID.
messages list[ChatMessage] Conversation messages.
demo_output ChatMessageWithMetadata | None Expected output (display only).
checks list[CheckConfig] Configured checks.
comments list[TestCaseComment] Annotations.
tags list[str] Tags for filtering.
status "active" | "draft" Test case status.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
hub.playground_chats
Section titled “hub.playground_chats”Access conversations captured from the Hub's interactive playground UI.
from giskard_hub.types import PlaygroundChat.list() → list[PlaygroundChat] project_id str Required Project ID.
include list[Literal["agent"]] | None Embed related resources (["agent"]).
limit int | None Maximum results.
offset int | None Results offset.
.retrieve() → PlaygroundChat chat_id str Required Chat ID.
include list[Literal["agent"]] | None Embed related resources (["agent"]).
.delete() → None Delete a playground chat by its ID.
chat_id str Required .bulk_delete() → None Delete multiple playground chats at once.
chat_ids list[str] Required hub.audit_logs
Section titled “hub.audit_logs”Query the audit trail for compliance reporting, change history, and debugging. Every create, update, and delete action is recorded.
from giskard_hub.types import Audit, AuditDisplay.search() → list[Audit] | tuple[list[Audit], APIPaginatedMetadata] Search audit events with free-text queries, filters, and pagination. Pass include_metadata=True for tuple[list[Audit], APIPaginatedMetadata].
query str | None Free-text search query.
filters AuditFiltersParam | None Filter criteria (see filter keys below).
order_by list[AuditOrderByParam] | None Sorting criteria.
limit int | None Maximum results.
offset int | None Results offset.
include_metadata bool Default: False Include pagination metadata. If true, returns a tuple of (results, metadata).
Filter keys:
| Key | Type | Example |
|---|---|---|
project_id | list filter | {"selected_options": ["project-id"]} |
entity_type | list filter | {"selected_options": ["agent", "evaluation"]} |
action | list filter | {"selected_options": ["create", "delete"]} |
user_id | list filter | {"selected_options": ["user-id"]} |
created_at | date range | {"from_": "2025-01-01T00:00:00Z", "to_": "2025-12-31T23:59:59Z"} |
.list_entities() → list[AuditDisplay] List audit history for a specific resource, including diffs of each change. Pass include_metadata=True for pagination metadata.
entity_id str Required entity_type str Required "project", "agent", "evaluation"). limit int offset int include_metadata bool Default: False Error types
Section titled “Error types”All exceptions inherit from HubClientError and are importable from the root package.
from giskard_hub import ( HubClientError, # Base exception for all SDK errors APIStatusError, # Base for HTTP status errors (has .status_code, .response) APITimeoutError, # Request timed out APIConnectionError, # Could not connect to the Hub BadRequestError, # 400 AuthenticationError, # 401 — invalid or missing API key PermissionDeniedError, # 403 — insufficient permissions NotFoundError, # 404 — resource does not exist ConflictError, # 409 — resource conflict UnprocessableEntityError, # 422 — validation error RateLimitError, # 429 — too many requests InternalServerError, # 500+ — server error)from giskard_hub import HubClient, NotFoundError, AuthenticationError
hub = HubClient()
try: agent = hub.agents.retrieve("nonexistent-id")except NotFoundError as e: print(f"Agent not found: {e}")except AuthenticationError: print("Check your API key")Advanced patterns
Section titled “Advanced patterns”Pagination
Section titled “Pagination”Methods that support pagination accept limit and offset. Pass include_metadata=True to get an APIPaginatedMetadata object:
results, metadata = hub.evaluations.results.search( "evaluation-id", limit=50, offset=0, include_metadata=True,)print(f"Page: {metadata.count} of {metadata.total} (offset {metadata.offset})")Raw response access
Section titled “Raw response access”response = hub.with_raw_response.agents.retrieve("agent-id")print(response.status_code)agent = response.parse()Retries and timeouts
Section titled “Retries and timeouts”hub.with_options(max_retries=5, timeout=300.0).evaluations.create(...)Custom HTTP client
Section titled “Custom HTTP client”from giskard_hub import HubClient, DefaultHttpxClient
hub = HubClient( http_client=DefaultHttpxClient(proxy="http://proxy.example.com:8080"),)Debug logging
Section titled “Debug logging”export GISKARD_HUB_LOG=debugCommon extra parameters
Section titled “Common extra parameters”Every method accepts these optional keyword arguments for per-request customization:
extra_headers dict[str, str] | None Additional HTTP headers for this request.
extra_query dict[str, object] | None Additional query parameters.
extra_body object | None Additional JSON body fields.
timeout float | httpx.Timeout | None Override the default timeout for this request.