Create test cases and datasets

This section will guide you through creating your own test datasets programmatically.

A dataset is a collection of chat test cases (conversations) used to evaluate your agents. We allow manual test creation for fine-grained control, but since generative AI agents can encounter an infinite number of scenarios, automated test case generation is often necessary, especially when you don’t have any chat transcripts to import.

Create manual tests

Create manual test cases using the hub.datasets.create() and hub.chat_test_cases.create() methods.

Create manual tests

Generate security tests

Detect security failures, by generating synthetic test cases to detect security failures, like stereotypes & discrimination or prompt injection, using adversarial queries.

Generate security tests

Generate knowledge base tests

Detect business failures, by generating synthetic test cases to detect business failures, like hallucinations or denial to answer questions, using document-based queries and knowledge bases.

Generate knowledge base tests

Import tests

Import existing test datasets from a JSONL or CSV file, obtained from another tool, like Giskard Open Source.

Import tests

Tip

For advanced automated discovery of weaknesses such as prompt injection or hallucinations, check out our Vulnerability Scanner, which uses automated agents to generate tests for common security and robustness issues.

High-level workflow

        graph LR
    A[Create Dataset] --> B{Source}
    B --> C([<a href="manual.html" target="_self">Create Manually</a>])
    B --> D([<a href="import.html" target="_self">Import Existing</a>])
    B --> E([<a href="knowledge_base.html" target="_self">Knowledge Base Tests</a>])
    B --> F([<a href="scenario.html" target="_self">Scenario Tests</a>])
    B --> G([<a href="../scan/index.html" target="_self">From Scan</a>])
    C --> H[<a href="../annotate/index.html" target="_self">Review Test Cases</a>]
    D --> H
    E --> H
    F --> H
    G --> H