What are Giskard Checks?
Giskard Checks is a lightweight Python library for testing and evaluating non-deterministic applications such as LLM-based systems.
Introduction
Section titled “Introduction”Giskard Checks provides a flexible and powerful framework for testing AI applications including RAG systems, agents, summarization models, and more. Whether you’re building chatbots, question-answering systems, or complex multi-step workflows, Giskard Checks helps you ensure quality and reliability.
Key Features
Section titled “Key Features”- Built-in Check Library: Ready-to-use checks including LLM-as-a-judge evaluations, string matching, equality assertions, and more
- Flexible Testing Framework: Support for both single-turn and multi-turn scenarios with stateful trace management
- Type-Safe & Modern: Built on Pydantic for full type safety and validation
- Async-First: Native async/await support for efficient concurrent testing
- Highly Customizable: Easy extension points for custom checks and interaction patterns
- Serializable Results: Immutable, JSON-serializable results for easy storage and analysis
New here? Install with Install & Configure, then follow Your First Test for a guided lesson or Quickstart for a single example.
Quick Links
Section titled “Quick Links” Installation Install the library and configure your LLM provider
Quickstart Your first scenario in under 5 minutes
Tutorials Step-by-step learning arc from your first test to reusable test suites
How-to Guides Task-oriented guides for pytest, custom checks, CI/CD, spy, and more
API Reference Complete API documentation for all checks, scenarios, and utilities
Explanation Core concepts, async design, JSONPath, and when to use which check
Use Cases Worked examples for RAG systems, agents, chatbots, and content moderation
Use Cases
Section titled “Use Cases”Giskard Checks is designed for:
- RAG Evaluation: Test groundedness, relevance, and context usage in retrieval-augmented generation systems
- Agent Testing: Validate multi-step agent workflows with tool calls and complex reasoning
- Quality Assurance: Ensure consistent output quality across model updates and deployments
- LLM Guardrails: Implement safety checks, content moderation, and compliance validation
- Regression Testing: Track model behavior changes over time with reproducible test suites