Reference
Client
Section titled “Client”Client classes for interacting with the Giskard Hub API. Two flavours are available with an identical API surface — pick the one that matches your runtime.
from giskard_hub import HubClient
hub = HubClient()projects = hub.projects.list()from giskard_hub import AsyncHubClientimport asyncio
async def main(): async with AsyncHubClient() as hub: projects = await hub.projects.list()
asyncio.run(main())HubClient
Section titled “HubClient”Synchronous client. All resource operations are available as attributes.
from giskard_hub import HubClient
hub = HubClient( api_key="gsk_...", # or set GISKARD_HUB_API_KEY env var base_url="https://hub.example.com", # or set GISKARD_HUB_BASE_URL env var)api_key str | None Default: env GISKARD_HUB_API_KEY Your Hub API key.
base_url str | httpx.URL | None Default: env GISKARD_HUB_BASE_URL Base URL of your Hub instance.
auto_add_api_suffix bool Default: True Automatically append /_api to base_url.
timeout float | httpx.Timeout | None Default: 60.0 Default request timeout in seconds. Pass an httpx.Timeout for fine-grained
control over connect, read, and write timeouts.
max_retries int Default: 2 Number of automatic retries on transient errors (connection errors, 5xx responses).
default_headers dict[str, str] | None Default: None Headers added to every request.
default_query dict[str, object] | None Default: None Query parameters added to every request.
http_client httpx.Client | None Default: None Custom httpx.Client instance for proxies, custom transports, or mutual TLS.
AsyncHubClient
Section titled “AsyncHubClient”Async counterpart with an identical API surface — every method is a coroutine. Accepts the same constructor arguments as HubClient, except http_client takes an httpx.AsyncClient instead of an httpx.Client.
from giskard_hub import AsyncHubClient
hub = AsyncHubClient( api_key="gsk_...", # or set GISKARD_HUB_API_KEY env var base_url="https://hub.example.com", # or set GISKARD_HUB_BASE_URL env var)api_key str | None Default: env GISKARD_HUB_API_KEY Your Hub API key.
base_url str | httpx.URL | None Default: env GISKARD_HUB_BASE_URL Base URL of your Hub instance.
auto_add_api_suffix bool Default: True Automatically append /_api to base_url.
timeout float | httpx.Timeout | None Default: 60.0 Default request timeout in seconds. Pass an httpx.Timeout for fine-grained
control over connect, read, and write timeouts.
max_retries int Default: 2 Number of automatic retries on transient errors (connection errors, 5xx responses).
default_headers dict[str, str] | None Default: None Headers added to every request.
default_query dict[str, object] | None Default: None Query parameters added to every request.
http_client httpx.AsyncClient | None Default: None Custom httpx.AsyncClient instance for proxies, custom transports, or mutual
TLS.
Resources
Section titled “Resources”Resource groups exposed by the client for managing Hub entities.
hub.agents
Section titled “hub.agents”Register, test, and invoke LLM agents. An agent represents your LLM application -- either a remote HTTP endpoint or a local Python callable.
from giskard_hub.types import ( Agent, AgentDetectStatefulness, AgentOutput, ChatMessage,).create() → Agent Create a new agent with configuration for external API communication.
name str Required Display name of the agent.
url str Required HTTP endpoint the Hub calls during evaluations and scans.
project_id str Required Project this agent belongs to.
supported_languages list[str] Required Language codes the agent supports (e.g. ["en", "fr"]).
headers dict[str, str] HTTP headers sent with every request to the agent (e.g. auth tokens). Each header is a {"name": str, "value": str} dict.
description str | None Human-readable description.
stateful bool | None Whether the agent is stateful.
agent = hub.agents.create( project_id=project.id, name="Support Bot v2", url="https://my-app.example.com/api/chat", supported_languages=["en"], headers={"Authorization": "Bearer <token>"}, description="GPT-4o chatbot with RAG",).retrieve() → Agent Retrieve an agent by its ID.
agent_id str Required ID of the agent to retrieve.
.update() → Agent Update an existing agent’s configuration. Only the provided fields are modified.
agent_id str Required ID of the agent to update.
name str | None url str | None description str | None headers dict[str, str] | None supported_languages list[str] | None .list() → list[Agent] List all agents, optionally filtered by project.
project_id str | None .delete() → None Delete an agent by its ID.
agent_id str Required .bulk_delete() → None Delete multiple agents at once.
agent_ids list[str] Required .generate_completion() → AgentOutput Call a registered agent with a list of messages and get the response.
agent_id str Required ID of the agent to call.
Conversation messages as [{"role": "user", "content": "..."}].
output = hub.agents.generate_completion( agent.id, messages=[{"role": "user", "content": "What is your return policy?"}],)print(output.response.content)print(output.metadata).test_connection() → AgentOutput Test connectivity to an agent endpoint without persisting the agent.
url str Required headers dict[str, str] .generate_description() → str Auto-generate a description for an agent by observing its behaviour. Returns the generated description.
agent_id str Required .detect_statefulness() → AgentDetectStatefulness Detect whether the agent is stateful by analyzing its behavior.
agent_id str Required hub.checks
Section titled “hub.checks”Define and manage reusable check criteria for evaluating agent responses. Checks are project-scoped and can be referenced by identifier in any test case.
from giskard_hub.types import Check, CheckResult.create() → Check Create a custom check in the specified project.
identifier str Required Unique identifier to reference this check in test cases.
name str Required Display name.
project_id str Required Project this check belongs to.
Check configuration (see check type params below).
description str | None Human-readable description.
check = hub.checks.create( project_id=project.id, identifier="tone_professional", name="Professional tone", params={"type": "conformity", "rules": ["Use formal language."]},).retrieve() → Check check_id str Required ID of the check to retrieve.
.update() → Check Update an existing check. Only the provided fields are modified.
check_id str Required identifier str | None name str | None params CheckTypeParam | None description str | None .list() → list[Check] project_id str Required Project ID to list checks for.
filter_builtin bool Whether to filter out built-in checks from the results. Default True.
.delete() → None check_id str Required ID of the check to delete.
.bulk_delete() → None check_ids list[str] Required IDs of checks to delete.
Check type params
Section titled “Check type params”The params field accepts one of these shapes:
| Type | params shape | Evaluation method |
|---|---|---|
| Correctness | {"type": "correctness", "reference": str} | LLM judge |
| Conformity | {"type": "conformity", "rules": list[str]} | LLM judge |
| Groundedness | {"type": "groundedness", "context": str} | LLM judge |
| Semantic similarity | {"type": "semantic_similarity", "reference": str, "threshold": float} | Embedding |
| String match | {"type": "string_match", "keyword": str} | Rule-based |
| Metadata | {"type": "metadata", "json_path_rules": list[JsonPathRule]} | Rule-based |
Each JsonPathRule: {"json_path": str, "expected_value": str, "expected_value_type": "string" | "number" | "boolean"}
hub.datasets
Section titled “hub.datasets”Create datasets, import test cases, and auto-generate test suites from scenarios or knowledge bases.
from giskard_hub.types import Dataset, TestCase, TaskProgress.create() → Dataset Create a new empty dataset in the specified project.
name str Required project_id str Required description str | None .upload() → Dataset Import test cases from a file or list of dicts into a dataset.
project_id str Required data FileTypes | list[dict[str, Any]] | str Required File path (str or Path), file-like object, or list of dicts. Each record should have a messages list and optional checks list.
dataset_id str | None name str | None .generate_scenario_based() → Dataset Generate a dataset of test cases from scenario definitions. The dataset’s status will be "running" until generation completes — use hub.helpers.wait_for_completion() to wait.
project_id str Required agent_id str Required scenario_id str Required n_examples int dataset_id str | None dataset_name str | None .generate_document_based() → Dataset Generate test cases grounded in knowledge base documents. Async — use hub.helpers.wait_for_completion().
agent_id str Required knowledge_base_id str Required project_id str Required dataset_name str description str | None n_examples int topic_ids list[str] .retrieve() → Dataset dataset_id str Required ID of the dataset to retrieve.
.update() → Dataset dataset_id str Required ID of the dataset to update.
name str | None Updated name.
description str | None Updated description.
status TaskProgress | None Async operation status.
.list() → list[Dataset] project_id str | None Project ID to filter by.
.delete() → None Delete a dataset by its ID.
dataset_id str Required .bulk_delete() → None Delete multiple datasets at once.
dataset_ids list[str] Required .list_tags() → list[str] List all tags used across test cases in a dataset.
dataset_id str Required .list_test_cases() → list[TestCase] List all test cases in a dataset.
dataset_id str Required .search_test_cases() → list[TestCase] Search test cases with filters, sorting, and pagination. Pass include_metadata=True to receive tuple[list[TestCase], APIPaginatedMetadata].
dataset_id str Required query str | None order_by list[TestCaseOrderByParam] | None filters TestCaseFiltersParam | None limit int | None offset int | None include_metadata bool Default: False hub.evaluations
Section titled “hub.evaluations”Run agents against datasets, inspect per-test-case results, and manage the evaluation lifecycle. Sub-resource: hub.evaluations.results.
from giskard_hub.types import Evaluation, Metric, CheckResult.create() → Evaluation Create and launch a new evaluation of an agent on a dataset.
project_id str Required Project ID.
agent_id str Required Agent to evaluate.
dataset_id str | None Dataset to evaluate against. Provide this or old_evaluation_id, not
both.
old_evaluation_id str | None Reuse a previous evaluation’s dataset.
name str Evaluation run name.
tags list[str] | None Filter test cases by tags.
run_count int Run each test case N times (for consistency testing).
scheduled_evaluation_id str | None Link to a scheduled evaluation.
evaluation = hub.evaluations.create( project_id=project.id, agent_id=agent.id, dataset_id=dataset.id, name="v2.1 regression run",)evaluation = hub.helpers.wait_for_completion(evaluation)hub.helpers.print_metrics(evaluation).create_local() → Evaluation Create a local evaluation for running agent inference in your own process.
Agent info as {"name": str, "description": str}.
dataset_id str | None name str | None tags list[str] | None old_evaluation_id str | None .run_single() → list[CheckResult] Evaluate a single (input, output) pair against checks without creating a full evaluation.
project_id str | None agent_description str .rerun_errored_results() → Evaluation Rerun all errored results without triggering a full re-evaluation.
evaluation_id str Required .retrieve() → Evaluation Retrieve an evaluation by its ID, with optional related resource inclusion.
evaluation_id str Required include list[Literal["agent", "dataset"]] | None .update() → Evaluation Update an evaluation’s name.
evaluation_id str Required name str Required .list() → list[Evaluation] List all evaluations for a project.
project_id str Required include list[Literal["agent", "dataset"]] | None .delete() → None Delete an evaluation by its ID.
evaluation_id str Required .bulk_delete() → None Delete multiple evaluations at once.
evaluation_ids list[str] Required hub.evaluations.results
Section titled “hub.evaluations.results”Inspect, filter, update, and rerun individual evaluation results.
from giskard_hub.types import TestCaseEvaluation, FailureCategory.retrieve() → TestCaseEvaluation result_id str Required Result ID.
evaluation_id str Required Evaluation ID.
include list[Literal["test_case"]] | None Embed related resources.
.update() → TestCaseEvaluation Update the failure category of an evaluation result.
result_id str Required evaluation_id str Required failure_category FailureCategoryParam | None .list() → list[TestCaseEvaluation] evaluation_id str Required Evaluation ID.
include list[Literal["test_case"]] | None Embed related resources.
.search() → list[TestCaseEvaluation] | tuple[list[TestCaseEvaluation], APIPaginatedMetadata] Search and filter results. Pass include_metadata=True for pagination metadata.
evaluation_id str Required query str | None filters ResultFiltersParam | None order_by list[ResultOrderByParam] | None limit int | None offset int | None include list[Literal["test_case"]] | None include_metadata bool Default: False .rerun_test_case() → TestCaseEvaluation result_id str Required Result ID.
evaluation_id str Required Evaluation ID.
.submit_local_output() → TestCaseEvaluation Submit locally-generated agent output for evaluation and scoring.
result_id str Required evaluation_id str Required agent_output AgentOutputParam | None error str | None .update_visibility() → TestCaseEvaluation Show or hide a result from the default view.
result_id str Required evaluation_id str Required hidden bool Required set_test_case_draft bool | None hub.helpers
Section titled “hub.helpers”High-level convenience methods for the most common SDK workflows: waiting for async operations, running evaluations, and printing metrics.
from giskard_hub.types import Evaluation, Scan, ChatMessage, AgentOutput.wait_for_completion() → TStateful Poll an entity until it leaves its running state. Returns the refreshed entity.
entity TStateful Required Any stateful entity: Evaluation, Scan, Dataset, KnowledgeBase,
ScanProbe, TestCaseEvaluation.
poll_interval float Default: 5.0 Seconds between polling requests.
max_retries int Default: 360 Maximum polling attempts. Default: 30 minutes at 5-second intervals.
running_states Collection[str] Default: {"running"} States considered as “still processing”.
error_states Collection[str] Default: {"error"} Terminal error states.
raise_on_error bool Default: True Raise ValueError if entity enters an error state.
.evaluate() → Evaluation Run an evaluation for a given agent over a dataset. Handles both remote and local agents.
Agent ID, Agent object, or a Python callable for local evaluation. Callable signature: (messages: list[ChatMessage]) -> str | ChatMessage | AgentOutput.
Dataset ID or Dataset object.
project str | Project | None Required when agent is remote (str or Agent). Not required for local callables.
name str | None tags list[str] | None evaluation = hub.helpers.evaluate( agent=my_agent, dataset=my_dataset, project=my_project, name="Remote eval",)def my_fn(messages: list[ChatMessage]) -> str: return "Hello from my local agent"
evaluation = hub.helpers.evaluate(agent=my_fn, dataset="dataset-id", name="Local eval",).print_metrics() → None Print a formatted metrics table to the console for an evaluation or scan.
The evaluation or scan to print metrics for.
hub.knowledge_bases
Section titled “hub.knowledge_bases”Create, search, and manage indexed document collections for grounded evaluations, document-based test generation, and knowledge-grounded vulnerability scans.
from giskard_hub.types import ( KnowledgeBase, KnowledgeBaseDocumentRow, KnowledgeBaseDocumentDetail,).create() → KnowledgeBase Create a knowledge base and upload documents. Indexing happens asynchronously after creation — use hub.helpers.wait_for_completion().
name str Required Display name.
project_id str Required Project this KB belongs to.
data FileTypes | list[dict[str, Any]] | str Required Documents as a list of dicts, a file path string, or a pathlib.Path
(JSON/JSONL format).
description str | None Human-readable description.
document_column str Column name for document text. Server defaults to "text" if omitted.
topic_column str Column name for topic label. Server defaults to "topic" if omitted.
kb = hub.knowledge_bases.create( project_id=project.id, name="Product Docs", data=[ {"text": "30-day return policy.", "topic": "Returns"}, {"text": "Free shipping over $50.", "topic": "Shipping"}, ],)kb = hub.helpers.wait_for_completion(kb).search_documents() → list[KnowledgeBaseDocumentRow] | tuple[list[KnowledgeBaseDocumentRow], APIPaginatedMetadata] Semantic search over documents in a knowledge base.
knowledge_base_id str Required query str | None filters KnowledgeBaseDocumentFiltersParam | None order_by list[KnowledgeBaseDocumentOrderByParam] | None limit int | None offset int | None include_metadata bool Default: False .retrieve_document() → KnowledgeBaseDocumentDetail Retrieve a specific document with its full content.
knowledge_base_id str Required document_id str Required .retrieve() → KnowledgeBase Retrieve a knowledge base by its ID, including its topics.
knowledge_base_id str Required .update() → KnowledgeBase Update a knowledge base’s metadata.
knowledge_base_id str Required name str | None description str | None project_id str | None status TaskProgress | None .list() → list[KnowledgeBase] List all knowledge bases, optionally filtered by project.
project_id str | None .delete() → None Delete a knowledge base by its ID.
knowledge_base_id str Required .bulk_delete() → None Delete multiple knowledge bases at once.
knowledge_base_ids list[str] Required hub.projects
Section titled “hub.projects”Top-level workspace that groups all related resources: agents, datasets, evaluations, scans, and more. Sub-resource: hub.projects.scenarios.
from giskard_hub.types import Project.create() → Project name str Required Project name.
description str | None Project description.
.update() → Project project_id str Required Project ID.
name str | None Updated name.
description str | None Updated description.
failure_categories Iterable[FailureCategoryParam] | None Project-level failure classifications.
.retrieve() → Project Retrieve a project by its ID.
project_id str Required .list() → list[Project] List all projects accessible to the current user.
.delete() → None Delete a project by its ID.
project_id str Required .bulk_delete() → None Delete multiple projects at once.
project_ids list[str] Required hub.projects.scenarios
Section titled “hub.projects.scenarios”Reusable persona and behaviour templates for scenario-based dataset generation.
from giskard_hub.types import Scenario, ScenarioPreview.create() → Scenario project_id str Required Project ID.
name str Required Scenario name.
description str Required Scenario description.
rules list[str] Rules the generated conversations should follow.
.preview() → ScenarioPreview Generate a preview conversation for a scenario without persisting it.
project_id str Required description str Required rules list[str] agent_id str | None .retrieve() → Scenario Retrieve a scenario by its ID within a project.
scenario_id str Required project_id str Required .update() → Scenario Update an existing scenario’s definition.
scenario_id str Required project_id str Required name str | None description str | None rules list[str] | None .list() → list[Scenario] List all scenarios for a project.
project_id str Required .delete() → None Delete a scenario from a project.
scenario_id str Required project_id str Required hub.scans
Section titled “hub.scans”Launch automated vulnerability scans covering the OWASP LLM Top 10 and additional threat categories. Sub-resources: hub.scans.probes, hub.scans.attempts.
from giskard_hub.types import ( Scan, ScanCategory, ScanProbe, ScanProbeAttempt, Severity, ReviewStatus,).create() → Scan Launch a new vulnerability scan of an agent.
project_id str Required Project ID.
agent_id str Required Agent to scan.
knowledge_base_id str | None Anchor probes to KB documents for domain-specific attacks.
probe_ids list[str] | None List of specific LIDAR probe IDs to run in the scan.
tags list[str] | None Limit scan to specific threat categories (e.g.
["gsk:threat-type='prompt-injection'"]).
scan = hub.scans.create( project_id=project.id, agent_id=agent.id, tags=["gsk:threat-type='prompt-injection'"],)scan = hub.helpers.wait_for_completion(scan)print(f"Grade: {scan.grade}")hub.helpers.print_metrics(scan).list_categories() → list[ScanCategory] List all available scan categories and their OWASP mappings.
.list_probes() → list[ScanProbe] List all probe results for a completed scan.
scan_id str Required .retrieve() → Scan Retrieve a scan result by its ID, with optional related resource inclusion.
scan_id str Required include list[Literal["agent", "knowledge_base"]] | None .list() → list[Scan] List all scan results, optionally filtered by project.
project_id str | None include list[Literal["agent", "knowledge_base"]] | None .delete() → None Delete a scan result by its ID.
scan_id str Required .bulk_delete() → None Delete multiple scan results at once.
scan_ids list[str] Required .list_available_probes() → list[ScanAvailableProbe] List all probe definitions available for scanning.
hub.scans.probes
Section titled “hub.scans.probes”.retrieve() → ScanProbe probe_id str Required Probe ID.
.list_attempts() → list[ScanProbeAttempt] List all adversarial attempts for a specific probe.
probe_id str Required hub.scans.attempts
Section titled “hub.scans.attempts”.update() → ScanProbeAttempt Update a probe attempt’s review status, severity, or success flag.
probe_attempt_id str Required review_status ReviewStatus | None "pending", "ignored", "acknowledged", "corrected". severity Severity | None SAFE (0), MINOR (10), MAJOR (20), CRITICAL (30). successful bool | None hub.scheduled_evaluations
Section titled “hub.scheduled_evaluations”Set up recurring evaluation runs on a daily, weekly, or monthly cadence for continuous quality monitoring.
from giskard_hub.types import ScheduledEvaluation, FrequencyOption.create() → ScheduledEvaluation project_id str Required Project ID.
agent_id str Required Agent to evaluate.
dataset_id str Required Dataset to evaluate against.
"daily", "weekly", or "monthly".
name str Required Name of the scheduled evaluation.
time str Required Time of day in HH:MM format (UTC).
day_of_week int | None Weekly only: 1 (Monday) through 7 (Sunday).
day_of_month int | None Monthly only: 1 through 28.
tags list[str] | None Filter test cases by tags.
run_count int | None Run each test case N times.
.list_evaluations() → list[Evaluation] List all past evaluation runs generated by this scheduled evaluation.
scheduled_evaluation_id str Required include list[Literal["agent", "dataset"]] | None .retrieve() → ScheduledEvaluation Retrieve a scheduled evaluation by its ID.
scheduled_evaluation_id str Required include list[Literal["evaluations"]] | None .update() → ScheduledEvaluation Update a scheduled evaluation’s configuration.
scheduled_evaluation_id str Required name str | None frequency FrequencyOption | None time str | None day_of_week int | None day_of_month int | None run_count int | None last_execution_at str | datetime | None last_execution_status LastExecutionStatusParam | None paused bool | None .list() → list[ScheduledEvaluation] List all scheduled evaluations for a project.
project_id str Required include list[Literal["evaluations"]] | None last_days int | None .delete() → None Delete a scheduled evaluation by its ID.
scheduled_evaluation_id str Required .bulk_delete() → None Delete multiple scheduled evaluations at once.
scheduled_evaluation_ids list[str] Required hub.tasks
Section titled “hub.tasks”Lightweight issue tracker for managing findings from evaluations and scans. Link tasks to specific evaluation results, test cases, or probe attempts.
from giskard_hub.types import Task, TaskStatus, TaskPriority.create() → Task project_id str Required Project ID.
description str Required What needs to be done.
priority TaskPriority | None "low", "medium", or "high".
status TaskStatus | None "open", "in_progress", or "resolved".
assignee_ids list[str] User IDs to assign.
evaluation_result_id str | None Link to a specific evaluation result.
dataset_test_case_id str | None Link to a specific test case.
probe_attempt_id str | None Link to a specific scan probe attempt.
disable_test bool Disable the linked test case.
hide_result bool Hide the linked evaluation result.
.retrieve() → Task Retrieve a task by its ID.
task_id str Required .update() → Task Update an existing task’s metadata and assignees.
task_id str Required status TaskStatus | None "open", "in_progress", or "resolved". priority TaskPriority | None "low", "medium", or "high". description str | None assignee_ids list[str] | None set_test_case_status str | None .list() → list[Task] List all tasks for a project, ordered by creation date descending.
project_id str | None .delete() → None Delete a task by its ID.
task_id str Required .bulk_delete() → None Delete multiple tasks at once.
task_ids list[str] Required hub.test_cases
Section titled “hub.test_cases”Create, update, and manage individual test cases within datasets. Sub-resource: hub.test_cases.comments.
from giskard_hub.types import TestCase, TestCaseComment, ChatMessageWithMetadata.create() → TestCase Create a new test case with conversation messages and optional checks.
dataset_id str Required Conversation messages as [{"role": "user", "content": "..."}]. Should not include the final assistant response.
checks Iterable[CheckConfigParam] Checks to apply: [{"identifier": "correctness", "params": {"reference": "..."}}].
demo_output str | ChatMessageWithMetadataParam | None Expected output for display only — not used during evaluation.
status "active" | "draft" | None tags list[str] .retrieve() → TestCase Retrieve a test case by its ID.
test_case_id str Required .update() → TestCase Update an existing test case’s messages, checks, tags, or status.
test_case_id str Required messages Iterable[ChatMessageParam] | None checks Iterable[CheckConfigParam] | None demo_output str | ChatMessageWithMetadataParam | None status "active" | "draft" | None tags list[str] | None dataset_id str | None .delete() → None Delete a test case by its ID.
test_case_id str Required .bulk_delete() → None test_case_ids list[str] Required IDs of test cases to delete.
.bulk_update() → list[TestCase] Update multiple test cases at once. Returns the updated test cases.
test_case_ids list[str] Required status Literal["active", "draft"] | None disabled_checks list[str] | None enabled_checks list[str] | None added_tags list[str] | None removed_tags list[str] | None .bulk_move() → None Move or copy test cases to another dataset.
test_case_ids list[str] Required target_dataset_id str Required duplicate bool hub.test_cases.comments
Section titled “hub.test_cases.comments”.add() → TestCaseComment test_case_id str Required Test case ID.
content str Required Comment text.
.edit() → TestCaseComment comment_id str Required Comment ID.
test_case_id str Required Test case ID.
content str Required Updated text.
.delete() → None comment_id str Required Comment ID.
test_case_id str Required Test case ID.
hub.playground_chats
Section titled “hub.playground_chats”Access conversations captured from the Hub's interactive playground UI.
from giskard_hub.types import PlaygroundChat.list() → list[PlaygroundChat] project_id str Required Project ID.
include list[Literal["agent"]] | None Embed related resources (["agent"]).
limit int | None Maximum results.
offset int | None Results offset.
.retrieve() → PlaygroundChat chat_id str Required Chat ID.
include list[Literal["agent"]] | None Embed related resources (["agent"]).
.delete() → None Delete a playground chat by its ID.
chat_id str Required .bulk_delete() → None Delete multiple playground chats at once.
chat_ids list[str] Required hub.audit_logs
Section titled “hub.audit_logs”Query the audit trail for compliance reporting, change history, and debugging. Every create, update, and delete action is recorded.
from giskard_hub.types import Audit, AuditDisplay.search() → list[Audit] | tuple[list[Audit], APIPaginatedMetadata] Search audit events with free-text queries, filters, and pagination. Pass include_metadata=True for tuple[list[Audit], APIPaginatedMetadata].
query str | None Free-text search query.
filters AuditFiltersParam | None Filter criteria (see filter keys below).
order_by list[AuditOrderByParam] | None Sorting criteria.
limit int Maximum results.
offset int Results offset.
include_metadata bool Default: False Include pagination metadata. If true, returns a tuple of (results, metadata).
Filter keys:
| Key | Type | Example |
|---|---|---|
project_id | list filter | {"selected_options": ["project-id"]} |
entity_type | list filter | {"selected_options": ["agent", "evaluation"]} |
action | list filter | {"selected_options": ["create", "delete"]} |
user_id | list filter | {"selected_options": ["user-id"]} |
created_at | date range | {"from_": "2025-01-01T00:00:00Z", "to_": "2025-12-31T23:59:59Z"} |
.list_entities() → list[AuditDisplay] | tuple[list[AuditDisplay], APIPaginatedMetadata] List audit history for a specific resource, including diffs of each change. Pass include_metadata=True for pagination metadata.
entity_id str Required entity_type str Required "project", "agent", "evaluation"). limit int offset int include_metadata bool Default: False All Python types referenced by the methods above. Click any type name in a method’s return value or parameter to jump straight to its definition. Each card is collapsed by default — expand it to see the fields.
Core types
Section titled “Core types”Shared building blocks used by every resource. The *Param variants are TypedDicts used in request bodies.
role str Sender role: typically "user", "assistant", or "system".
content str Message text.
role str Required Sender role: typically "user", "assistant", or "system".
content str Required Message text.
role str Sender role.
content str Message text.
metadata dict[str, object] | None Arbitrary metadata attached to the message.
role str Required Sender role.
content str Required Message text.
metadata dict[str, object] | None Arbitrary metadata attached to the message.
name str Header name.
value str Header value.
name str Required Header name.
value str Required Header value.
message str Error message returned by the agent or runtime.
details dict[str, object] | None Optional structured error context.
message str Required Error message returned by the agent or runtime.
details dict[str, object] Optional structured error context.
id str Unique identifier.
email str User email address.
name str | None Display name, if set.
id str Unique identifier.
name str Display name.
state TaskState Current state.
current int Items processed so far.
total int Total items to process.
error str | None Error message if the task failed.
"running" Task is in progress. "finished" Task completed successfully. "error" Task failed. "canceled" Task was canceled. "skipped" Task was skipped. count int Number of items returned in this page.
offset int Offset of the first item in this page.
limit int Maximum page size requested.
total int Total number of items across all pages.
Agent types
Section titled “Agent types”id str Unique identifier.
name str Display name.
description str | None Human-readable description.
url str HTTP endpoint URL.
project_id str Parent project ID.
supported_languages list[str] Language codes the agent supports.
headers dict[str, str] HTTP headers sent with every request.
stateful bool Whether the agent is stateful.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
id str Unique identifier.
name str Display name.
response ChatMessage | None The agent’s response message.
error ExecutionError | None Error details if the agent call failed.
metadata dict[str, object] | None Arbitrary metadata returned by the agent.
The agent’s response message.
error ExecutionErrorParam | None Error details if the agent call failed.
metadata dict[str, object] Arbitrary metadata returned by the agent.
stateful bool Whether the agent was detected as stateful.
name str Agent name (used for local evaluations).
description str | None Optional description.
name str Required Agent name.
description str | None Optional description.
Check types
Section titled “Check types”id str Unique identifier.
built_in bool Whether this is a built-in check.
identifier str Reusable identifier string.
name str Display name.
description str | None Human-readable description.
project_id str Parent project ID.
params dict[str, Any] Check-specific configuration. Shape depends on the check type — see Check type params.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
name str Check identifier.
display_name str | None Human-readable name.
status TaskState Execution status.
passed bool | None Whether the check passed.
error str | None Error message if execution failed.
reason str | None LLM judge’s reasoning (for LLM-based checks).
annotations list[OutputAnnotation] | None Annotated spans in the agent’s response.
identifier str Check identifier.
enabled bool | None Whether the check is enabled.
params dict[str, Any] Check-specific parameters (without the type discriminator).
identifier str Required Check identifier to apply.
enabled bool Whether the check is enabled.
params dict[str, Any] Check-specific parameters.
text str The annotated substring.
label str Label assigned to the span.
start_char_index int Start position in the response (character offset).
end_char_index int End position in the response (character offset).
type "output" | "context" Whether the annotation references the agent’s output or its retrieved context.
json_path str JSONPath expression to evaluate against the agent’s output metadata.
expected_value bool | float | str The value the JSONPath should resolve to.
expected_value_type "string" | "number" | "boolean" Expected primitive type of the resolved value.
alias union TypeAlias for the union of CorrectnessParamsParam, ConformityParamsParam, GroundednessParamsParam, StringMatchParamsParam, MetadataParamsParam, and SemanticSimilarityParamsParam. See Check type params for the concrete shapes.
Dataset and test case types
Section titled “Dataset and test case types”id str Unique identifier.
name str Display name.
description str | None Human-readable description.
project_id str Parent project ID.
status TaskProgress Async operation status (for generated datasets).
tags list[str] All tags used across test cases.
state TaskState Computed from status.state — e.g. "finished", "running".
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
id str Unique identifier.
name str Display name.
dataset_id str Dataset to subset.
tags list[str] | None Restrict to test cases matching these tags.
target_type "dataset" | None Discriminator for criterion unions.
id str Unique identifier.
dataset_id str Parent dataset ID.
messages list[ChatMessage] Conversation messages.
demo_output AgentOutput | None Expected output (display only — not used during evaluation).
checks list[CheckConfig] Configured checks.
comments list[TestCaseComment] Annotations attached to this test case.
tags list[str] Tags for filtering.
status "active" | "draft" Test case status.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
id str Unique identifier.
id str Unique identifier.
content str Comment text.
user UserReference Author of the comment.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
id "created_at" | "id" | "status" | "updated_at" Required Column to sort by.
desc bool Sort descending when true.
alias dict Dict mapping a column name to a filter value. Valid columns: "metrics", "status", "tags".
Evaluation types
Section titled “Evaluation types”id str Unique identifier.
name str Display name.
agent AgentReference | MinimalAgent | Agent The evaluated agent.
dataset Dataset | DatasetReference The dataset used.
criteria DatasetSubset | None Subset of the dataset used as evaluation criteria.
project_id str Parent project ID.
local bool Whether this is a local evaluation.
metrics list[Metric] Aggregated pass/fail metrics per check.
failure_categories dict[str, int] Counts of results per failure category identifier.
tags list[Metric] Per-tag aggregated metrics.
status TaskProgress Async operation status.
state TaskState Computed from status.state — "finished", "running", "error".
old_evaluation_id str | None ID of the previous evaluation this one is based on.
scheduled_evaluation_id str | None ID of the scheduled evaluation that produced this run.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
id str Unique identifier.
name str Display name.
name str Check identifier (e.g. "correctness", "global").
display_name str | None Human-readable name.
passed int | None Number of test cases that passed.
failed int | None Number of test cases that failed.
errored int | None Number of test cases that errored.
total int | None Total test cases evaluated.
success_rate float | None Pass rate as a float between 0.0 and 1.0.
id str Unique identifier.
evaluation_id str Parent evaluation ID.
test_case TestCase | TestCaseReference The test case.
test_case_exists bool | None Whether the test case still exists.
state TaskState Result state: "finished", "running", "error".
results list[CheckResult] Per-check outcomes.
output AgentOutput | None The agent’s actual response.
error str | None Error message if the agent call failed.
failure_category FailureCategoryResult | None Assigned failure classification.
hidden bool Whether this result is hidden from the default view.
divergence_warnings list[DivergenceWarning] | None Divergence warnings detected during multi-turn evaluation.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
turn int The conversation turn where divergence was detected.
expected str The expected message content.
actual str The actual message content received.
identifier str Stable identifier (e.g. "hallucination").
title str Display title.
description str Human-readable description.
identifier str Required Stable identifier.
title str Required Display title.
description str Required Human-readable description.
id str Unique identifier.
category FailureCategory | None The assigned failure category.
status TaskState | None Classification status.
error str | None Error message if classification failed.
id "failure_category_name" | "id" | "sample_success" | "status" | "visibility" Required Column to sort by.
desc bool Sort descending when true.
alias dict Dict mapping a column name to a filter value. Valid columns: "failure_category_name", "metrics", "sample_success", "status", "tags", "visibility".
Scan types
Section titled “Scan types”id str Unique identifier.
agent AgentReference | Agent The scanned agent.
project_id str Parent project ID.
knowledge_base KnowledgeBaseReference | KnowledgeBase | None Linked knowledge base, if the scan was grounded.
grade "A" | "B" | "C" | "D" | None Overall grade.
status TaskProgress Async operation status.
state TaskState Computed from status.state.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
id str Unique probe identifier.
name str Probe display name.
desc str Human-readable description.
tags list[str] Tags applied to this probe.
id str Unique identifier.
title str Display title.
description str Human-readable description.
owasp_id str | None Mapping to the OWASP LLM Top 10, if applicable.
id str Unique identifier.
name str Probe display name.
category str Probe category.
description str Human-readable description.
probe_lidar_id str LIDAR probe identifier.
tags list[str] Tags applied to this probe.
scan_id str Parent scan ID.
metrics list[ScanProbeMetric] | None Aggregated severity counts.
status TaskProgress Async operation status.
state TaskState Convenience accessor for status.state.
severity Severity Severity level.
count int Number of attempts at this severity.
id str Unique identifier.
probe_id str Parent probe ID.
messages list[ChatMessageWithMetadata] Conversation messages exchanged with the agent.
metadata dict[str, object] Arbitrary metadata about the attempt.
reason str Why this attempt was generated.
severity Severity Severity assigned to the attempt outcome.
review_status ReviewStatus Reviewer-assigned status.
error ScanProbeAttemptError | None Error details if the attempt failed to execute.
message str Error message.
SAFE 0 No vulnerability found.
MINOR 10 Minor issue.
MAJOR 20 Significant issue.
CRITICAL 30 Critical vulnerability.
"pending" Awaiting review. "ignored" Reviewer dismissed the finding. "acknowledged" Reviewer acknowledged the finding. "corrected" The underlying issue has been fixed. id str Unique identifier.
Knowledge base types
Section titled “Knowledge base types”id str Unique identifier.
name str Display name.
description str | None Human-readable description.
filename str | None Original upload filename.
project_id str Parent project ID.
n_documents int Number of indexed documents.
topics list[Topic] Discovered topics.
status TaskProgress Async indexing status.
state TaskState Computed from status.state.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
id str Unique identifier.
name str Display name.
id str Unique identifier.
name str Topic name.
knowledge_base_id str Parent knowledge base ID.
document_count int | None Number of documents in this topic.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
id str Unique identifier.
knowledge_base_id str Parent knowledge base ID.
snippet str Truncated content snippet.
content str Computed alias of snippet (the truncated content shown in search results).
topic_id str | None Topic ID, if classified.
topic_name str | None Topic display name.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
id str Unique identifier.
knowledge_base_id str Parent knowledge base ID.
content str Full document content.
topic_id str | None Topic ID, if classified.
topic_name str | None Topic display name.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
id "created_at" | "updated_at" | "topic_id" Required Column to sort by.
desc bool Sort descending when true.
alias dict Dict mapping a column name to a filter value. Valid columns: "topic_id".
Project and scenario types
Section titled “Project and scenario types”id str Unique identifier.
name str Display name.
description str Human-readable description.
failure_categories list[FailureCategory] Project-level failure classifications.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
id str Unique identifier.
name str Scenario name.
description str | None Scenario description.
rules list[str] Rules the generated conversations should follow.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
conversation list[dict[str, Any]] Generated preview conversation.
generated_rules list[str] | None Rules inferred from the scenario description.
Scheduled evaluation types
Section titled “Scheduled evaluation types”"daily" Run every day. "weekly" Run on a specific day each week. "monthly" Run on a specific day each month. alias union TypeAlias for SuccessExecutionStatus | ErrorExecutionStatus | None.
alias union TypeAlias for SuccessExecutionStatusParam | ErrorExecutionStatusParam.
evaluation_id str ID of the evaluation produced by the execution.
status "success" Always "success".
evaluation_id str Required ID of the evaluation produced by the execution.
status "success" Always "success".
error_message str Description of what went wrong.
status "error" Always "error".
error_message str Required Description of what went wrong.
status "error" Always "error".
id str Unique identifier.
name str Display name.
project_id str Parent project ID.
agent_id str Agent to evaluate.
dataset_id str Dataset to evaluate against.
frequency FrequencyOption "daily", "weekly", or "monthly".
time str Time of day in HH:MM format (UTC).
day_of_week int | None Weekly only: 1 (Monday) through 7 (Sunday).
day_of_month int | None Monthly only: 1 through 28.
tags list[str] Tags used to filter test cases.
run_count int Number of times each test case is run per execution.
paused bool Whether the schedule is currently paused.
last_execution_at datetime | None Timestamp of the most recent execution.
last_execution_status LastExecutionStatus Status of the most recent execution.
evaluations list[EvaluationReference] Evaluation runs produced by this schedule.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
Task types
Section titled “Task types”id str Unique identifier.
description str Task description.
status TaskStatus Current status.
priority TaskPriority | None Priority level.
project_id str Parent project ID.
created_by User User who created the task.
assignees list[User] Assigned users.
references list[TestCaseEvaluationReference | ScanProbeAttemptReference | TestCaseReference] Linked resources (evaluation results, test cases, or probe attempts).
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
"open" Newly created, not yet picked up. "in_progress" Being worked on. "resolved" Closed. "low" Low priority. "medium" Medium priority. "high" High priority. Playground chat types
Section titled “Playground chat types”id str Unique identifier.
project_id str Parent project ID.
user UserReference | None The user who started the chat.
agent AgentReference | Agent | None The agent that responded.
messages list[ChatMessageWithMetadata] Conversation messages.
created_at datetime Creation timestamp.
updated_at datetime Last update timestamp.
Audit types
Section titled “Audit types”id str Unique identifier.
action "insert" | "update" | "delete" Action performed on the entity.
entity_id str UUID of the affected entity.
entity_type str Type of the affected entity (e.g. "agent", "evaluation").
user_id str | None User who performed the action, if recorded.
project_id str | None Project the entity belongs to, if applicable.
diff list[AuditDiffItem] | None Field-level changes for update actions.
metadata dict[str, object] | None Arbitrary metadata captured with the event.
created_at datetime When the action occurred.
id str Unique identifier.
action "insert" | "update" | "delete" Action performed.
user_id str User who performed the action.
user_name str | None User display name.
diffs list[AuditDisplayDiffItem] Pre-formatted diff items for display.
real_change_count int Number of fields that actually changed.
summary_fields list[str] Field names highlighted in the summary.
created_at datetime When the action occurred.
kind "added" | "removed" | "changed" Kind of change.
field str Field path.
old_value Any | None Previous value (for removed/changed).
new_value Any | None New value (for added/changed).
kind "added" | "removed" | "changed" | "skip" Kind of change for display.
scope str The scope of the change.
root str Root field name.
label str | None Display label for the changed field.
before_str str | None Pre-formatted previous value.
after_str str | None Pre-formatted new value.
skip_count int | None Number of skipped items if kind="skip".
id "action" | "created_at" | "entity_type" | "project_id" | "user_id" Required Column to sort by.
desc bool Sort descending when true.
alias dict Dict mapping a column name to a filter value. Valid columns: "action", "created_at", "entity_type", "project_id", "user_id".
Error types
Section titled “Error types”All exceptions inherit from HubClientError and are importable from the root package.
from giskard_hub import ( HubClientError, # Base exception for all SDK errors APIStatusError, # Base for HTTP status errors (has .status_code, .response) APITimeoutError, # Request timed out APIConnectionError, # Could not connect to the Hub BadRequestError, # 400 AuthenticationError, # 401 — invalid or missing API key PermissionDeniedError, # 403 — insufficient permissions NotFoundError, # 404 — resource does not exist ConflictError, # 409 — resource conflict UnprocessableEntityError, # 422 — validation error RateLimitError, # 429 — too many requests InternalServerError, # 500+ — server error)from giskard_hub import HubClient, NotFoundError, AuthenticationError
hub = HubClient()
try: agent = hub.agents.retrieve("nonexistent-id")except NotFoundError as e: print(f"Agent not found: {e}")except AuthenticationError: print("Check your API key")Advanced patterns
Section titled “Advanced patterns”Pagination
Section titled “Pagination”Methods that support pagination accept limit and offset. Pass include_metadata=True to get an APIPaginatedMetadata object:
results, metadata = hub.evaluations.results.search( "evaluation-id", limit=50, offset=0, include_metadata=True,)print(f"Page: {metadata.count} of {metadata.total} (offset {metadata.offset})")Raw response access
Section titled “Raw response access”response = hub.with_raw_response.agents.retrieve("agent-id")print(response.status_code)agent = response.parse()Retries and timeouts
Section titled “Retries and timeouts”hub.with_options(max_retries=5, timeout=300.0).evaluations.create(...)Custom HTTP client
Section titled “Custom HTTP client”from giskard_hub import HubClient, DefaultHttpxClient
hub = HubClient( http_client=DefaultHttpxClient(proxy="http://proxy.example.com:8080"),)Debug logging
Section titled “Debug logging”export GISKARD_HUB_LOG=debugCommon extra parameters
Section titled “Common extra parameters”Every method accepts these optional keyword arguments for per-request customization:
extra_headers dict[str, str] | None Additional HTTP headers for this request.
extra_query dict[str, object] | None Additional query parameters.
extra_body object | None Additional JSON body fields.
timeout float | httpx.Timeout | None Override the default timeout for this request.