What are Giskard Checks?

Giskard Checks is a lightweight Python library for testing and evaluating non-deterministic applications such as LLM-based systems.

What are Giskard Checks

Giskard Checks provides a flexible and powerful framework for testing AI applications including RAG systems, agents, summarization models, and more. Whether you’re building chatbots, question-answering systems, or complex multi-step workflows, Giskard Checks helps you ensure quality and reliability.

Key Features

Built-in Check Library: Ready-to-use checks including LLM-as-a-judge evaluations, string matching, equality assertions, and more
Flexible Testing Framework: Support for both single-turn and multi-turn scenarios with stateful trace management
Type-Safe & Modern: Built on Pydantic for full type safety and validation
Async-First: Native async/await support for efficient concurrent testing
Highly Customizable: Easy extension points for custom checks and interaction patterns
Serializable Results: Immutable, JSON-serializable results for easy storage and analysis

New here? Install with Install & Configure, then follow Your First Test for a guided lesson or Quickstart for a single example.

Documentation

Installation Install the library and configure your LLM provider

Quickstart Your first scenario in under 5 minutes

Agent Skills Drop-in skills that turn Claude Code and other coding agents into Giskard specialists

Tutorials Step-by-step learning arc from your first test to reusable test suites

How-to Guides Task-oriented guides for pytest, custom checks, CI/CD, spy, and more

API Reference Complete API documentation for all checks, scenarios, and utilities

Explanation Core concepts, async design, JSONPath, and when to use which check

Use Cases Worked examples for RAG systems, agents, chatbots, and content moderation

Use Cases

Giskard Checks is designed for:

RAG Evaluation: Test groundedness, relevance, and context usage in retrieval-augmented generation systems
Agent Testing: Validate multi-step agent workflows with tool calls and complex reasoning
Quality Assurance: Ensure consistent output quality across model updates and deployments
LLM Guardrails: Implement safety checks, content moderation, and compliance validation
Regression Testing: Track model behavior changes over time with reproducible test suites