Attack categories

Comprehensive guide to AI security vulnerabilities and attack patterns tested by Giskard’s vulnerability scan.

The vulnerability scan uses specialized probes (structured adversarial tests) to stress-test AI systems and uncover weaknesses before malicious actors do. Each probe is designed to expose specific vulnerabilities in AI agents, from harmful content generation to unauthorized system access.

This catalog organizes vulnerabilities by risk category and provides detailed information about:

  • Attack patterns and techniques

  • Specific probes used for testing

  • Detection indicators

  • Mitigation strategies

  • Risk levels and business impact

Use this guide to understand the security landscape for AI systems and make informed decisions about which vulnerabilities to prioritize in your testing.

Overview

At Giskard, we use probes to stress-test AI systems and uncover vulnerabilities before malicious actors do. A probe is a structured adversarial test designed to expose weaknesses in an AI agent, such as harmful content generation, data leakage, or unauthorized tool execution. By simulating real-world attacks, probes help teams identify and fix risks early—reducing both security threats and business failures.

Below you’ll find the full catalog of probes, organized by vulnerability category. Each category includes a short explanation and detailed information about the corresponding probes.

Harmful Content Generation

Probes that attempt to bypass safety measures and generate dangerous, illegal, or harmful content across various categories

Probes: 17

Harmful Content Generation
Internal Information Exposure

Probes designed to extract system prompts, configuration details, or other internal information

Probes: 2

Internal Information Exposure
Prompt Injection

Attacks that attempt to manipulate AI agents through carefully crafted input prompts to override original instructions

Probes: 12

Prompt Injection
Data Privacy Exfiltration

Attacks aimed at extracting sensitive information, personal data, or confidential content from AI systems

Probes: 4

Data Privacy Exfiltration
Training Data Extraction

Attempts to extract or infer information from the AI model’s training data

Probes: 1

Training Data Extraction
Excessive Agency

Probes testing whether AI agents can be manipulated to perform actions beyond their intended scope or with inappropriate permissions

Probes: 6

Excessive Agency
Hallucination

Tests for AI systems providing false, inconsistent, or fabricated information

Probes: 4

Hallucination
Denial Of Service

Probes that attempt to cause resource exhaustion or performance degradation

Probes: 2

Denial Of Service
Brand Damaging And Reputation

Tests for reputational risks and brand damage scenarios

Probes: 2

Brand Damaging And Reputation
Legal And Financial Risk

Probes targeting potential legal and financial liabilities

Probes: 1

Legal And Financial Risk
Misguidance And Unauthorized Advice

Probes that test whether AI agents can be manipulated to provide professional advice outside their intended scope

Probes: 2

Misguidance And Unauthorized Advice