Scan Vulnerabilities

The Giskard library provides an automatic scan that detects safety and security vulnerabilities affecting your LLM-based agents.

How does it work?

From a plain-language description of your agent, an LLM generates adversarial scenarios tailored to it and runs them against your agent over single-turn and multi-turn conversations. Then, a second LLM acts as a judge, deciding whether each response reveals a vulnerability.

Differently from benchmarks that evaluate a foundation model in a generic way, Giskard’s scan performs an in-depth, domain-specific assessment of your agent, based on the description you provide.

Which attacks does it run?

Today, the scan generates three families of attacks, mapped to the OWASP LLM Top 10 and the vulnerability categories used across Giskard:

Attack	What it does	Vulnerability category
Prompt injection	Hides an injected instruction inside realistic content to see whether the agent obeys it instead of its original instructions.	Prompt Injection (OWASP LLM01)
Single-turn adversarial	Sends direct requests that test for harmful or unauthorized content such as stereotypes and discrimination, illegal activities, CBRN material, copyright, misinformation, and unqualified financial, medical, or legal advice.	Harmful Content Generation, Misguidance & Unauthorized Advice
GOAT multi-turn jailbreak	Uses an attacker LLM that adapts over several turns, chaining strategies such as refusal suppression, persona modification, and hypothetical framing to push the agent toward objectives it should refuse.	Harmful Content Generation

These cover high-impact attack surfaces today, and the library keeps adding more. See the roadmap for what is coming, and the vulnerability categories catalog for the complete list of categories.

What data is sent to Language Model providers?

The scan uses an LLM both to generate adversarial scenarios and to judge your agent’s answers. During the scan, these models receive your agent’s description and the responses it produces. However, they never receive anything you do not pass to the agent. In both cases, you choose the provider and model (see Before starting).

Will the scan work in any language?

Yes. Pass the languages your agent is expected to handle through the languages argument (BCP-47 codes such as "en", "fr", or "es"), and the scenarios will be generated in those languages. Since generation is handled by the LLM you configure, pick a model with strong support for your target languages for the best results.

Before starting

First, install the scan and choose the model that will generate and judge the scenarios. Since giskard-scan pulls in the rest of the library, you only need to add the extra for the provider you use. Pick yours below:

pip install "giskard-scan[openai]"

from giskard.agents.generators import GiskardLLMGenerator
from giskard.checks import set_default_generator

llm_judge = GiskardLLMGenerator(model="openai/gpt-4o")
set_default_generator(llm_judge)

Set OPENAI_API_KEY in your environment.

pip install "giskard-scan[anthropic]"

from giskard.agents.generators import GiskardLLMGenerator
from giskard.checks import set_default_generator

llm_judge = GiskardLLMGenerator(model="anthropic/claude-sonnet-4-20250514")
set_default_generator(llm_judge)

Set ANTHROPIC_API_KEY in your environment.

pip install "giskard-scan[google]"

from giskard.agents.generators import GiskardLLMGenerator
from giskard.checks import set_default_generator

llm_judge = GiskardLLMGenerator(model="gemini/gemini-2.5-flash")
set_default_generator(llm_judge)

Set GEMINI_API_KEY (or GOOGLE_API_KEY) in your environment.

LiteLLM reaches any provider through a single "<provider>/<model-name>" string, so it is the most flexible option.

pip install "giskard-scan[litellm]"

from giskard.agents.generators import LiteLLMGenerator
from giskard.checks import set_default_generator

llm_judge = LiteLLMGenerator(model="<provider>/<model-name>")
set_default_generator(llm_judge)

For example mistral/mistral-large-latest, bedrock/anthropic.claude-3-sonnet-20240229-v1:0, or ollama/qwen2.5. For the full list of providers, see LiteLLM’s provider conventions and set the matching API key in your environment.

set_default_generator makes your chosen model the default for generating and judging every scenario in the scan.

Step 1: Wrap your model

The scan talks to your agent through a single entry point, an async function that takes a typed input and returns a typed output. Both types are Pydantic models, which makes the contract explicit and validated.

Since multi-turn attacks call your agent once per turn, with only the new message as input, the right wrapper depends on how your agent keeps track of the conversation. Pick the pattern that matches yours:

If your agent answers each message independently, wrap it directly:

from pydantic import BaseModel


class AgentInput(BaseModel):
    question: str


class AgentOutput(BaseModel):
    answer: str


async def my_agent(inputs: AgentInput) -> AgentOutput:
    # Call your own LLM app, chain, or agent here
    answer = await my_llm_app(inputs.question)
    return AgentOutput(answer=answer)

If your agent stores the conversation on its side (for example a LangGraph checkpointer or a session-based API), it needs the same thread id on every turn of a conversation. To get one, subclass Trace with a generated thread_id field and declare a trace parameter. Giskard creates the trace when a conversation starts and preserves its fields across turns, so each conversation gets its own stable id:

from uuid import uuid4
from pydantic import BaseModel, Field
from giskard.checks import Trace


class AgentInput(BaseModel):
    question: str


class AgentOutput(BaseModel):
    answer: str


class AgentTrace(Trace[AgentInput, AgentOutput]):
    thread_id: str = Field(default_factory=lambda: str(uuid4()))


async def my_agent(inputs: AgentInput, trace: AgentTrace) -> AgentOutput:
    # The same thread_id is kept for every turn of this conversation
    answer = await my_llm_app(inputs.question, thread_id=trace.thread_id)
    return AgentOutput(answer=answer)

If your agent is stateless and expects the full message history on every call, declare a trace parameter. During the scan, trace.interactions holds the previous turns of the current conversation, so you can rebuild the history and append the new message:

from pydantic import BaseModel
from giskard.checks import Trace


class AgentInput(BaseModel):
    question: str


class AgentOutput(BaseModel):
    answer: str


async def my_agent(
    inputs: AgentInput, trace: Trace[AgentInput, AgentOutput]
) -> AgentOutput:
    # Rebuild the conversation history from the previous turns
    messages = []
    for interaction in trace.interactions:
        messages.append({"role": "user", "content": interaction.inputs.question})
        messages.append({"role": "assistant", "content": interaction.outputs.answer})
    messages.append({"role": "user", "content": inputs.question})

    answer = await my_llm_app(messages)
    return AgentOutput(answer=answer)

This is the only integration code you need. In fact, anything callable from Python (a RAG pipeline, an agent, or a remote API) can be wrapped this way.

Step 2: Scan your model

Pass your wrapped agent, a plain-language description, and the languages it handles to vulnerability_scan. It generates the adversarial suite, runs every scenario, prints a grouped report, and returns the result:

from giskard.scan import vulnerability_scan

suite_result = await vulnerability_scan(
    target=my_agent,
    description="A Q&A agent that answers questions about our product.",
    languages=["en"],
)

While the suite runs, Giskard shows live progress for each scenario, with a count of how many passed and failed:

Giskard scan running, with a progress bar over the scenarios, per-scenario rows, and a passed and failed count

The description is what the LLM uses to generate domain-specific scenarios, so the more precisely you describe your agent’s purpose and boundaries, the more relevant the findings.

For every failed scenario, the report shows the judge’s verdict and the full conversation trace that triggered it, so you can see exactly how the agent was manipulated:

A failed scan scenario showing the judge's verdict and the multi-turn conversation trace of inputs and outputs

What’s next?

Save your suite

Generating scenarios uses an LLM, so it is good practice to generate the suite once and reuse it. The result exposes the suite that produced it via suite_result.suite. Since Suite is a Pydantic model, you can serialize it to JSON and store it (commit it to your repository or keep it as a build artifact):

from pathlib import Path

Path("scan_suite.json").write_text(suite_result.suite.model_dump_json())

Run the suite in CI/CD

In your pipeline, load the saved suite and run it against your agent. This time, the scenarios are not regenerated: only the judging step calls the LLM, so remember to configure a judge in CI as well. The run then returns a result you can export as a JUnit XML report for your CI test dashboard:

from pathlib import Path
from giskard.checks import Suite

suite = Suite.model_validate_json(Path("scan_suite.json").read_text())

suite_result = await suite.run(target=my_agent)
suite_result.to_junit_xml("scan_results.xml")

Re-run the same scenarios on another model

The saved suite can be pointed at any target. For example, once you have fixed an issue or shipped a new version, run the exact same scenarios against the new agent to confirm that the vulnerabilities are gone and that nothing regressed:

suite_result = await suite.run(target=my_other_agent)

Advanced usage

You can customize the scan by passing options directly to vulnerability_scan.

Run only specific scenarios

By default, the scan runs all of its built-in generators. To focus on a single class of vulnerability, pass the generators you want via the lower-level generate_suite API:

from giskard.scan import generate_suite, PromptInjectionScenarioGenerator

suite = await generate_suite(
    description="A Q&A agent that answers questions about our product.",
    languages=["en"],
    generators=[PromptInjectionScenarioGenerator()],
)

suite_result = await suite.run(target=my_agent)

Make the scan faster

Limit the total number of scenarios with max_scenarios, and cap concurrent execution with max_concurrency:

suite_result = await vulnerability_scan(
    target=my_agent,
    description="A Q&A agent that answers questions about our product.",
    languages=["en"],
    max_scenarios=20,
    max_concurrency=10,
)

Build a broader suite with a coding agent

The method scan.generate_suite builds a suite from the built-in scenarios. To go further, the Scenario Generator skill turns your coding agent into a red-teamer. Describe your agent and the failure modes you care about, and the skill writes or extends a runnable suite with adversarial scenarios and layered checks tailored to your case. You then run it with suite.run(...) exactly as above.

npx skills add Giskard-AI/giskard-skills --skill scenario-generator

For example, prompt your agent with “red-team my support bot for PII leaks and competitor mentions”. You can browse the full set of skills at Giskard Skills.

Use the Giskard Hub

The scan on this page runs locally and is driven by code. When you need more than that, the Giskard Hub, our enterprise platform, manages the complete red teaming workflow: through the web interface, the Python SDK, or the API, you launch a more advanced scan (55+ probes), get a security grade for your agent, and turn the findings into test datasets that your whole team, including business experts, can review and annotate. On top of that, continuous red teaming keeps testing your deployed agent against emerging threats, catching vulnerabilities and regressions before they can be exploited.

Giskard Hub scan report showing a security grade, issue counts by severity, and results broken down by vulnerability category with OWASP tags

For a complete picture of what the Hub adds, read the Open Source vs Hub comparison, or talk to our team ↗ to see it in action.

Roadmap

Our direction is to make giskard.scan the single entry point for red teaming in the open-source library. To get there, we are continuously expanding the attack library with new vulnerability categories and richer multi-turn attacks, and extending the same approach to RAG and agent evaluation. As that coverage grows, everything on this page keeps working unchanged, since new scenarios are picked up automatically by the vulnerability_scan call you already wrote.

Troubleshooting

If you encounter any issues, join our Discord community and ask in the #general channel.