Giskard: AI Agent Evaluation & Red Teaming Platform

Welcome to Giskard! This section will help you understand what Giskard is, choose the right offering for your needs, and get started quickly. See our AI testing glossary for key concepts.

Giskard Hub – Our enterprise platform for LLM agent testing with team collaboration and continuous red teaming, offering both a user-friendly UI for business users and a powerful SDK for technical users
Giskard Open-Source - Open-source Python library for LLM testing and evaluation, offering a programmatic interface for technical users, with basic testing capabilities to get started.
Giskard Research - Our research on AI safety & security

Giskard Hub

Giskard Hub is our enterprise platform for LLM agent testing with advanced team collaboration and continuous red teaming. It provides a set of tools for business users and developers to test and evaluate Agents in production environments, including:

Team collaboration - Real-time collaboration with shared workspaces, collaborative annotation workflows, and role-based access control for seamless team coordination
Continuous red teaming - Continuous threat detection for new vulnerabilities with automated scanning and monitoring capabilities
Access control - Manage who can see what data and run which tests across your organization
Dataset management - Centralized storage and versioning of test cases for consistent testing
Custom failure categories - Define and categorize your own failure types beyond standard security and business logic issues
Enterprise compliance features - 2FA, audit logs, SSO, and enterprise-grade security controls
Custom business checks - Create and deploy your own specialized testing logic and validation rules
Alerting - Get notified when issues are detected with configurable notification systems
Evaluations - Agent evaluations with cron-based scheduling for continuous monitoring
Knowledge bases - Store and manage domain knowledge to enhance testing scenarios

Giskard Hub UI As a business user, use the Giskard Hub to create test datasets, run evaluations, and manage your team.

Giskard Hub SDK As a developer, use the SDK to interact with the Giskard Hub programmatically.

Open source

Giskard Open Source is a Python library for LLM testing and evaluation. It is available on GitHub ↗ and formed the basis for our course on Red Teaming LLM Applications on Deeplearning.AI ↗.

The library provides a set of tools for testing and evaluating LLMs, including:

Automated detection of security vulnerabilities using LLM Scan.
Automated detection of business logic failures using RAG Evaluation Toolkit.

Giskard Open Source Use the Open Source SDK to get familiar with basic testset generation for business and security failures.

Deeplearning.AI Course Our course on red teaming LLM applications helps you understand how to test, red team, and evaluate LLM applications.

Unsure about the difference between Open Source and Hub? Check out our comparison guide to learn more about the different features.

Open research

Giskard Research contributes to research on AI safety and security to showcase and understand the latest advancements in the field. Some work has been funded by the the European Commission ↗, Bpifrance ↗, and we’ve collaborated with leading AI research organizations like the AI Incident Database ↗ and Google DeepMind ↗.

Phare Multilingual benchmark to evaluate LLMs across key safety & security dimensions, including hallucination, factual accuracy, bias, and potential harm.

RealHarm Dataset of problematic interactions with textual AI agents built from a systematic review of publicly reported incidents.

RealPerformance Dataset of functional issues of language models that mirrors failure patterns identified through rigorous testing in real LLM agents.

Papers: Phare (arXiv) ↗ | RealHarm (arXiv) ↗