Attack categories
Comprehensive guide to AI security vulnerabilities and attack patterns tested by Giskard’s vulnerability scan.
The vulnerability scan uses specialized probes (structured adversarial tests) to stress-test AI systems and uncover weaknesses before malicious actors do. Each probe is designed to expose specific vulnerabilities in AI agents, from harmful content generation to unauthorized system access.
This catalog organizes vulnerabilities by risk category and provides detailed information about:
Attack patterns and techniques
Specific probes used for testing
Detection indicators
Mitigation strategies
Risk levels and business impact
Use this guide to understand the security landscape for AI systems and make informed decisions about which vulnerabilities to prioritize in your testing.
Overview
At Giskard, we use probes to stress-test AI systems and uncover vulnerabilities before malicious actors do. A probe is a structured adversarial test designed to expose weaknesses in an AI agent, such as harmful content generation, data leakage, or unauthorized tool execution. By simulating real-world attacks, probes help teams identify and fix risks early—reducing both security threats and business failures.
Below you’ll find the full catalog of probes, organized by vulnerability category. Each category includes a short explanation and detailed information about the corresponding probes.
Probes that attempt to bypass safety measures and generate dangerous, illegal, or harmful content across various categories
Probes: 17
Probes designed to extract system prompts, configuration details, or other internal information
Probes: 2
Attacks that attempt to manipulate AI agents through carefully crafted input prompts to override original instructions
Probes: 12
Attacks aimed at extracting sensitive information, personal data, or confidential content from AI systems
Probes: 4
Attempts to extract or infer information from the AI model’s training data
Probes: 1
Probes testing whether AI agents can be manipulated to perform actions beyond their intended scope or with inappropriate permissions
Probes: 6
Tests for AI systems providing false, inconsistent, or fabricated information
Probes: 4
Probes that attempt to cause resource exhaustion or performance degradation
Probes: 2
Tests for reputational risks and brand damage scenarios
Probes: 2
Probes targeting potential legal and financial liabilities
Probes: 1
Probes that test whether AI agents can be manipulated to provide professional advice outside their intended scope
Probes: 2