Skip to content
GitHubDiscord

Scan Vulnerabilities

The Giskard library provides an automatic scan that detects safety and security vulnerabilities affecting your LLM-based agents.

From a plain-language description of your agent, an LLM generates adversarial scenarios tailored to it and runs them against your agent over single-turn and multi-turn conversations. Then, a second LLM acts as a judge, deciding whether each response reveals a vulnerability.

Differently from benchmarks that evaluate a foundation model in a generic way, Giskard’s scan performs an in-depth, domain-specific assessment of your agent, based on the description you provide.

Today, the scan generates three families of attacks, mapped to the OWASP LLM Top 10 and the vulnerability categories used across Giskard:

AttackWhat it doesVulnerability category
Prompt injectionHides an injected instruction inside realistic content to see whether the agent obeys it instead of its original instructions.Prompt Injection (OWASP LLM01)
Single-turn adversarialSends direct requests that test for harmful or unauthorized content such as stereotypes and discrimination, illegal activities, CBRN material, copyright, misinformation, and unqualified financial, medical, or legal advice.Harmful Content Generation, Misguidance & Unauthorized Advice
GOAT multi-turn jailbreakUses an attacker LLM that adapts over several turns, chaining strategies such as refusal suppression, persona modification, and hypothetical framing to push the agent toward objectives it should refuse.Harmful Content Generation

These cover high-impact attack surfaces today, and the library keeps adding more. See the roadmap for what is coming, and the vulnerability categories catalog for the complete list of categories.

What data is sent to Language Model providers?

Section titled “What data is sent to Language Model providers?”

The scan uses an LLM both to generate adversarial scenarios and to judge your agent’s answers. During the scan, these models receive your agent’s description and the responses it produces. However, they never receive anything you do not pass to the agent. In both cases, you choose the provider and model (see Before starting).

Yes. Pass the languages your agent is expected to handle through the languages argument (BCP-47 codes such as "en", "fr", or "es"), and the scenarios will be generated in those languages. Since generation is handled by the LLM you configure, pick a model with strong support for your target languages for the best results.

First, install the scan and choose the model that will generate and judge the scenarios. Since giskard-scan pulls in the rest of the library, you only need to add the extra for the provider you use. Pick yours below:

Terminal window
pip install "giskard-scan[openai]"
from giskard.agents.generators import GiskardLLMGenerator
from giskard.checks import set_default_generator
llm_judge = GiskardLLMGenerator(model="openai/gpt-4o")
set_default_generator(llm_judge)

Set OPENAI_API_KEY in your environment.

set_default_generator makes your chosen model the default for generating and judging every scenario in the scan.

The scan talks to your agent through a single entry point, an async function that takes a typed input and returns a typed output. Both types are Pydantic models, which makes the contract explicit and validated.

Since multi-turn attacks call your agent once per turn, with only the new message as input, the right wrapper depends on how your agent keeps track of the conversation. Pick the pattern that matches yours:

If your agent answers each message independently, wrap it directly:

from pydantic import BaseModel
class AgentInput(BaseModel):
question: str
class AgentOutput(BaseModel):
answer: str
async def my_agent(inputs: AgentInput) -> AgentOutput:
# Call your own LLM app, chain, or agent here
answer = await my_llm_app(inputs.question)
return AgentOutput(answer=answer)

This is the only integration code you need. In fact, anything callable from Python (a RAG pipeline, an agent, or a remote API) can be wrapped this way.

Describe your agent, generate a suite of scenarios, then run it against your wrapped model:

from giskard.scan import generate_suite
# Generate adversarial scenarios from a plain-language description
suite = await generate_suite(
description="A Q&A agent that answers questions about our product.",
languages=["en"],
)
# Run every scenario against your agent
suite_result = await suite.run(target=my_agent)
# Print the report to your console
suite_result.print_report()

While the suite runs, Giskard shows live progress for each scenario, with a count of how many passed and failed:

Giskard scan running, with a progress bar over the scenarios, per-scenario rows, and a passed and failed count

The description is what the LLM uses to generate domain-specific scenarios, so the more precisely you describe your agent’s purpose and boundaries, the more relevant the findings.

For every failed scenario, the report shows the judge’s verdict and the full conversation trace that triggered it, so you can see exactly how the agent was manipulated:

A failed scan scenario showing the judge's verdict and the multi-turn conversation trace of inputs and outputs

Generating scenarios uses an LLM, so it is good practice to generate the suite once and reuse it. Since a Suite is a Pydantic model, you can serialize it to JSON and store it (commit it to your repository or keep it as a build artifact):

from pathlib import Path
Path("scan_suite.json").write_text(suite.model_dump_json())

In your pipeline, load the saved suite and run it against your agent. This time, the scenarios are not regenerated: only the judging step calls the LLM, so remember to configure a judge in CI as well. The run then returns a result you can export as a JUnit XML report for your CI test dashboard:

from pathlib import Path
from giskard.checks import Suite
suite = Suite.model_validate_json(Path("scan_suite.json").read_text())
suite_result = await suite.run(target=my_agent)
suite_result.to_junit_xml("scan_results.xml")

Re-run the same scenarios on another model

Section titled “Re-run the same scenarios on another model”

The same suite can be pointed at any target. For example, once you have fixed an issue or shipped a new version, run the exact same scenarios against the new agent to confirm that the vulnerabilities are gone and that nothing regressed:

suite_result = await suite.run(target=my_other_agent)

You can customize the scan by passing options to generate_suite and suite.run.

By default, the scan runs all of its built-in scenarios. To focus on a single class of vulnerability, pass the generators you want:

from giskard.scan import generate_suite, PromptInjectionScenarioGenerator
suite = await generate_suite(
description="A Q&A agent that answers questions about our product.",
languages=["en"],
generators=[PromptInjectionScenarioGenerator()],
)
suite_result = await suite.run(target=my_agent)

Limit the total number of scenarios with max_scenarios, and run them concurrently with parallel and max_concurrency:

suite = await generate_suite(
description="A Q&A agent that answers questions about our product.",
languages=["en"],
max_scenarios=20,
)
suite_result = await suite.run(
target=my_agent,
parallel=True,
max_concurrency=10,
)

The method scan.generate_suite builds a suite from the built-in scenarios. To go further, the Scenario Generator skill turns your coding agent into a red-teamer. Describe your agent and the failure modes you care about, and the skill writes or extends a runnable suite with adversarial scenarios and layered checks tailored to your case. You then run it with suite.run(...) exactly as above.

Terminal window
npx skills add Giskard-AI/giskard-skills --skill scenario-generator

For example, prompt your agent with “red-team my support bot for PII leaks and competitor mentions”. You can browse the full set of skills at Giskard Skills.

The scan on this page runs locally and is driven by code. When you need more than that, the Giskard Hub, our enterprise platform, manages the complete red teaming workflow: through the web interface, the Python SDK, or the API, you launch a more advanced scan (55+ probes), get a security grade for your agent, and turn the findings into test datasets that your whole team, including business experts, can review and annotate. On top of that, continuous red teaming keeps testing your deployed agent against emerging threats, catching vulnerabilities and regressions before they can be exploited.

Giskard Hub scan report showing a security grade, issue counts by severity, and results broken down by vulnerability category with OWASP tags

For a complete picture of what the Hub adds, read the Open Source vs Hub comparison, or talk to our team ↗ to see it in action.

Our direction is to make giskard.scan the single entry point for red teaming in the open-source library. To get there, we are continuously expanding the attack library with new vulnerability categories and richer multi-turn attacks, and extending the same approach to RAG and agent evaluation. As that coverage grows, everything on this page keeps working unchanged, since new scenarios are picked up automatically by the generate_suite and suite.run calls you already wrote.

If you encounter any issues, join our Discord community and ask in the #general channel.