AI Testing & Evaluation Glossary

This glossary defines key terms and concepts used throughout the Giskard documentation. Understanding these terms will help you navigate the documentation and use Giskard effectively.

The glossary is organized into several key areas: core concepts that form the foundation of AI testing, testing and evaluation methodologies, security vulnerabilities that can compromise AI systems, business failures that affect operational effectiveness, and essential concepts for access control, integration, and compliance.

Core concepts

Project A container for organizing related models, datasets, checks, and evaluations within Giskard Hub.

Model A trained machine learning model, particularly Large Language Models (LLMs) that process and generate text.

Agent An AI system or agent that can perform tasks autonomously, often using tools and following specific instructions.

Tool A function or capability that an agent can use to perform tasks, often provided by external services or APIs.

Dataset A collection of test cases, examples, or data points used to evaluate model performance and behavior.

Test Case A specific input-output pair or scenario used to evaluate model behavior and performance.

Check A specific test or validation rule that evaluates a particular aspect of model behavior (e.g., correctness, security, fairness).

Evaluation The process of testing a model against a dataset to assess its performance, safety, and compliance.

Testing and evaluation

AI Business Failures AI system failures that affect business logic, including hallucinations, omission, contradiction, and moderation issues.

AI Security Vulnerabilities AI system failures that affect security, including prompt injection, harmful content generation, and information disclosure.

Adversarial Testing Testing methodology that intentionally tries to break or exploit models using carefully crafted inputs designed to trigger failures.

Human-in-the-Loop Combining automated testing with human expertise and judgment.

Regression Testing Ensuring that new changes don't break existing functionality.

Continuous Red Teaming Automated, ongoing security testing that continuously monitors for new threats and vulnerabilities.

Security vulnerabilities

Prompt Injection A security vulnerability where malicious input manipulates the model's behavior or extracts sensitive information.

Harmful Content Generation Production of violent, illegal, or inappropriate material by AI models.

Information Disclosure Leaking sensitive data or private information from training data or user interactions.

Output Formatting Issues Manipulation of response structure for malicious purposes or poor output formatting.

Robustness Issues Vulnerability to adversarial inputs or edge cases causing inconsistent behavior.

Access and permissions

Access Rights Permissions that control what users can see and do within the Giskard Hub platform.

Role-Based Access Control (RBAC) A security model that assigns permissions based on user roles rather than individual user accounts.

Scope The level of access a user has, which can be global (platform-wide) or limited to specific projects or resources.

Permission A specific action or operation that a user is allowed to perform, such as creating projects or running evaluations.

Integration and workflows

SDK (Software Development Kit) A collection of tools and libraries that allow developers to integrate Giskard functionality into their applications and workflows.

API (Application Programming Interface) A set of rules and protocols that allows different software applications to communicate and exchange data.

Business and compliance

Compliance Adherence to laws, regulations, and industry standards that govern data privacy, security, and ethical AI use.

Audit Trail A chronological record of all actions, changes, and access attempts within a system for compliance and security purposes.

Governance The framework of policies, procedures, and controls that ensure responsible and ethical use of AI systems.

Stakeholder Individuals or groups with an interest in the performance, safety, and compliance of AI systems.

Getting help

Giskard Hub? Check our Hub UI guide for practical examples
Open Source? Explore our Open Source docs for technical details