Release Notes
Below you will find the release notes for each version of Giskard Hub UI. Each entry covers new features, improvements, and bug fixes included in that release.
2.5.0 (2026-03-31)
Section titled â2.5.0 (2026-03-31)âWe are releasing a new version of the Hub that introduces a unified data table experience with saveable table views, stateful agent support for multi-turn conversations, server-side pagination for large evaluation and dataset pages, fine-grained probe selection for scans, and a new TokenBreak security probe.
Whatâs new?
Section titled âWhatâs new?âUnified data table UX All major data tables now share a consistent, modernized interface with sticky headers and a floating bulk actions bar. A new âtable viewsâ feature lets you save, restore, and manage your table configurations â including filters, sorting, and column visibility â so you can switch between different workflows without reconfiguring the table each time.
Stateful agent support Agents can now track conversation history on the server, enabling multi-turn conversations with persistent context. This helps you build and test agents that maintain context across multiple turns, improving reliability for conversational workflows.
Paginated evaluation results and datasets Evaluation results and dataset test cases now use server-side pagination. You can efficiently browse, filter, and compare results without slowdowns or excessive memory use, even in large projects.
Scan creation by probe IDs The scan creation page now lets you select individual probes from a searchable checklist, in addition to the existing category-based selection. This gives you more precise control over which probes to include in a scan run.
New security probe Enhanced scanning capabilities with a new built-in probe:
- TokenBreak (OWASP LLM 01 - Prompt Injection) - This probe tests whether your agent can be manipulated through obfuscated prompt injection. It embeds malicious instructions inside legitimate-looking user messages, then prepends characters to sensitive trigger words (e.g. âignoreâ â âAignoreâ) to evade input classifiers while remaining interpretable by the underlying language model. The technique exploits the gap between how safety filters tokenize text and how LLMs process it. Based on the TokenBreak attack research (paper). Supports English, French, Italian, and German.
Whatâs fixed?
Section titled âWhatâs fixed?â-
Severity-based probe ordering - Probe attempts and results are now ordered by severity, with the most critical issues appearing first. This helps you quickly identify and address the most important scan findings.
-
Color accessibility - UI color contrast and badge styling have been improved for both light and dark themes, ensuring better readability and compliance with accessibility standards.
-
Filter validation - Column filters are now properly formatted for backend search endpoints, preventing validation errors on audit, dataset, knowledge base, and evaluation pages.
-
Date range filter restore - Restoring saved table views or using URL parameters with date range filters now works as expected, displaying the correct results.
-
XLSX export reliability - Free-text fields in XLSX evaluation exports are now sanitized to remove illegal XML control characters, and exports are processed in the background to keep the app responsive.
-
Parallel page loading - Data fetches on several pages now run in parallel, resulting in noticeably faster page loads across the Hub.
2.4.0 (2026-02-24)
Section titled â2.4.0 (2026-02-24)âWe are releasing a new version of the Hub that introduces a server-backed dataset table for managing large datasets, enhanced Playground interactivity, a redesigned three-step groundedness evaluation pipeline, annotation highlights in the compare view, and safer markdown rendering in scan results.
Whatâs new?
Section titled âWhatâs new?âServer-backed dataset management Dataset test cases now load via a server-backed table, significantly improving performance for large datasets. You can filter, sort, and paginate through thousands of test cases without slowing down your browser. A new bulk action preview shows you exactly how many items will be affected before you apply changes, making large-scale edits safer and more predictable.
Enhanced Playground interactivity The Playground now features quick actions to streamline your prompt engineering workflow. You can instantly remove the last conversation turn, re-generate the assistantâs previous answer, and toggle between âPrettyâ and âRawâ Markdown rendering. Your display preference is saved in your browser, ensuring a consistent experience across sessions.
Advanced Groundedness pipeline Groundedness evaluation has been upgraded to a three-step pipeline that extracts evidence, evaluates individual claims, and re-checks borderline cases for higher accuracy. You can now view detailed, per-claim groundedness reasons directly in the results and comparison views, presented in a clear Markdown format. Confidence scoring has also been improved for more consistent results.
Annotation highlights in compare view The conversation comparison view now displays metric annotations â such as groundedness failures â directly in the response and context panels. This helps you understand evaluation results and the reasons behind metric failures without leaving the compare flow.
Re-run checks on local evaluations You can now re-run checks on local evaluation results without re-querying the model, preserving the existing output. This helps you iterate on check configurations more efficiently and safely.
Scan grade tooltip A tooltip explaining scan grades is now available on the dashboard and scan history views. You can hover over a grade to quickly understand what it means without navigating away from the summary screen.
Safe markdown rendering in scans Markdown rendering in scan results now blocks images and external links unless they match an approved domain list. This prevents adversarial agents from embedding tracking pixels or exfiltration links in their responses. Blocked content is displayed as plain text with a tooltip.
Whatâs fixed?
Section titled âWhatâs fixed?â-
Large Knowledge Base uploads - Uploading Knowledge Base files larger than 10 MB no longer fails.
-
Evaluation chart accuracy - The evaluation dashboard pie chart now correctly displays all status segments, and the bar chart visually distinguishes passed bars.
-
Playground cursor position - Editing messages in the Playground chat input now preserves the cursor position, even when typing in the middle of the text.
-
Agent description handling - Forms no longer error when an agent has no description set.
2.3.0 (2026-02-03)
Section titled â2.3.0 (2026-02-03)âWe are releasing a new version of the Hub that brings significant improvements to productivity and user experience. This release introduces bulk conversation management for faster dataset organization, a redesigned dashboard with side-by-side evaluations and scans, automated agent description generation, enhanced knowledge base browsing with dedicated chunk navigation, and improved list views across all resources. Weâve also added two new security probes (Domain Misguidance and Reasoning Denial of Service) and delivered major performance improvements for large evaluation runs.
Whatâs new?
Section titled âWhatâs new?âBulk conversation management You can now perform bulk operations on multiple conversations at once, including deleting conversations, exporting data (JSON format), modifying tags, updating checks, changing status, and moving conversations between projects. This helps you organize large conversation datasets faster and with fewer manual actions.
Redesigned dashboard The project dashboard now features a two-column layout displaying Evaluations and Scans side-by-side, with richer visual summaries including a scans grade gauge and an interactive issue-category treemap. Clicking on specific results takes you directly to the corresponding scan run for deeper analysis. This helps you get a faster, more actionable overview of both security and quality metrics and accelerates the diagnostic process without leaving the dashboard.

Automated agent description When setting up a new agent, you can now automatically generate its description with a single click. The system probes your agent and drafts a description that accurately depicts the agentâs tone, abilities, and constraints, which you can then review and edit as needed. This helps you create comprehensive agent documentation faster than writing from scratch and ensures you have a good description of the agent domain and abilities.
Faster evaluation results The evaluation results page now loads faster and remains responsive even for large evaluation runs with thousands of results. You can navigate between results while keeping your filters active. This helps you review large evaluations without slowdowns or delays.
Improved knowledge base browsing The Knowledge Base documents page now loads faster and includes improved search with topic filtering. A new dedicated chunk detail page lets you navigate between chunks with next/previous buttons while preserving your search and filter settings. This helps you find and review KB content faster, especially in large knowledge bases.
Enhanced list views Agents, Datasets, Checks, and Knowledge Bases now display in a compact list layout with search, sort, and filters (where applicable). Your search and filter settings are preserved when you share links or reload the page. This helps you search, find, and share items faster without losing your place.

Brand fonts update The Hub UI now uses the new brand typography with improved readability and visual consistency.
New security probes Enhanced scanning capabilities with two new built-in probes:
- Domain Misguidance (Misguidance & Unauthorized Advice) - A new dynamic attack probe that adapts its follow-up questions based on the agentâs previous answers to probe for weaknesses and see if the chatbot can be led into providing harmful or out-of-scope guidance. Similar to TAP (Tree-of-Attacks Prompting), this probe uses dynamic logic to detect potential misguidance vulnerabilities.
- Reasoning Denial of Service (OWASP LLM10 - Denial of Service) - This probe targets agents relying on reasoning models to detect availability vulnerabilities. Evaluation is done by comparing the resource consumption (latency and token count) of standard questions against obfuscated variations that require a reasoning step. Significant performance degradation on the obfuscated prompts indicates a vulnerability to reasoning-induced resource exhaustion.
Whatâs fixed?
Section titled âWhatâs fixed?â-
Dataset and test case access - Users without full permissions can now still view datasets and test cases. Check details are displayed when available based on your permission level. This helps limited-permission users review and work with datasets without needing extra access.
-
Scheduled evaluation validation - Improved validation when creating or updating scheduled evaluations to ensure proper configuration. This helps prevent misconfigured schedules that could fail or run at unexpected times.
-
Knowledge base import handling - KB imports now skip empty or missing documents and keep topics aligned with the remaining documents. This helps imports succeed when source data contains incomplete rows.
-
Pretty view rendering - Improved the âPrettyâ conversation view rendering, especially whitespace handling. This helps conversations display more cleanly and remain readable.
-
Task permissions - Fixed permission issues on tasks to ensure proper access control.
-
Chart tooltips - Improved tooltip rendering in evaluation charts.
2.2.1 (2025-12-17)
Section titled â2.2.1 (2025-12-17)âWeâre releasing a fix for an issue where tasks could not be saved while in edit mode.
Whatâs fixed?
Section titled âWhatâs fixed?â- Task editing - Fixed an issue that prevented tasks from being saved while in edit mode.
2.2.0 (2025-12-16)
Section titled â2.2.0 (2025-12-16)âWe are releasing a new version of the Hub UI that introduces scenario-based generation, bulk move operations from evaluations, improved list displays with search and filters, and two new probes. This helps you create more targeted test cases, efficiently build golden datasets, and better navigate your Hub resources.
Whatâs new?
Section titled âWhatâs new?âScenario-based generation Scenario-based generation replaces the previous Adversarial generation in the Giskard Hub. You can now choose between three test generation modes in the Hub: LLM vulnerability scanner (run 50+ probes), Knowledge base generation, and Scenario-based. Users are able to create more targeted, business-specific tests without editing the agent description. This helps to generate more realistic test cases. Users provide a description and rules. For example: Persona using slang/emojis asking about loans; rules enforce professional tone and refusal to do interest calculations.
Bulk move from evaluations You can now select specific conversations directly from an evaluation run and move or duplicate them into a specific dataset. This simplifies the process of curating high-quality examples for regression testing.
Better display of lists To improve navigability, we have introduced a search bar and dedicated filters across the platform. You can now easily search and filter through datasets, agents, knowledge bases, and checks, making it faster to locate specific assets in complex workspaces.
Enhanced Scans Improved scanning capabilities with new probes and better rendering:
- New Built-in Probes - Two new built-in probes added to the scanning toolkit
- ChatInject (OWASP LLM 01 - Prompt Injection) - This probe tests whether agents can be manipulated through malicious instructions formatted to match their native chat templates. Unlike traditional plain-text injection attacks, ChatInject exploits the structured role-based formatting (system, user, assistant tags) that agents use internally. By wrapping attack payloads with forged chat template tokens, mimicking the modelâs own instruction hierarchy, attackers can bypass defenses that rely on role priority. The probe includes a multi-turn variant that sends persuasive conversation, delimited with adequate separation tokens, inside one message to confuse the agent under test. This technique achieves significantly higher success rates than standard injection methods and transfers effectively across models, even when the target modelâs exact template structure is unknown.
- CoT Forgery (OWASP LLM 01 - Prompt Injection) - This probe implements the Chain-of-Thought (CoT) forgery attack strategy, which appends realistic and compliant reasoning traces to harmful requests that mimic the format and tone of legitimate reasoning steps, causing the model to continue the compliant reasoning pattern and answer requests it should refuse.
- Improved Markdown rendering - Enhanced Markdown rendering in Scan results
Whatâs fixed?
Section titled âWhatâs fixed?â- Permission fix for âAdd checksâ button - Fixed permission issue for test case creation
- Permission fix for âAdd taskâ button - Fixed permission issue for task creation
- Better handling of LiteLLM-specific embedding exceptions - Improved error handling for embedding generation errors
- Improved Scan error handling - Enhanced error handling for vulnerability scan errors
- Export fixes:
- Added missing parameters to export options
- Fixed metadata display in evaluation results
2.1.2 (2025-12-04)
Section titled â2.1.2 (2025-12-04)âWeâre releasing a hotfix that addresses a critical security vulnerability affecting the frontend stack.
Whatâs fixed?
Section titled âWhatâs fixed?â-
Security patch - Upgraded Next.js to 15.5.7 and React to 19.2.1 to remediate CVE-2025-55182.
References:
2.1.1 (2025-11-25)
Section titled â2.1.1 (2025-11-25)âWeâre releasing an emergency hotfix to resolve a critical issue that completely blocked outbound email delivery.
Whatâs fixed?
Section titled âWhatâs fixed?â- Email delivery restoration - Fixed a critical issue that prevented all outbound email delivery. Email sending has been restored for all notifications and system emails.
2.1.0 (2025-11-24)
Section titled â2.1.0 (2025-11-24)âWe are releasing a new version of the Hub UI that introduces audit logs, a new task system, enhanced scans, and improved UI and weâve added two more probes, called Harmful Misguidance and Agentic Tool Extraction. This helps you manage your evaluation process, track your scans, and improve the collaboration with your team.
Whatâs new?
Section titled âWhatâs new?âTask management This new feature enables teams to organize, track, and collaborate on test corrections directly within Giskard Hub. For more information, see Task management:
- Create Tasks from Failures - Create tasks directly from failed evaluations or scans
- Assign Owners with Notifications - Assign task owners with email notifications (opt-out available)
- Auto-Draft Bad Tests - When you create a task from a bad test, the system will propose to automatically mark it as draft
- Hide Noisy Results - You can now hide false positive results while still tracking them with tasks. For more information, see Modify test cases
- Prioritize Tasks - Set the priority of the task based on the importance of the work to be done
Draft Conversations & Datasets Draft mode lets you iterate privately on test cases without affecting live evaluations. Drafts are excluded from dashboards, reports, scheduled runs, and success rates, so your production metrics stay clean while you experiment. For more information, see Task management:
- Draft/Published Toggle - A toggle with a helpful explanation to draft or publish a conversation
- Draft Filter in Tables - Draft filter added to dataset tables with conversation status labels
- Consistent Exclusion - Drafted conversations are excluded from evaluation runs
Enhanced Scans Improved scanning capabilities and usability. For more information, see Scans:
- New Built-in Probes - Two new built-in probes added to the scanning toolkit
- Harmful Misguidance - This probe tests whether an AI agent validates or encourages dangerous behaviors when users implicitly seek affirmation for harmful actions. The probe generates domain-specific scenarios where users frame risky intentions positively or indirectly, like a banking customer asking about investing retirement savings in volatile assets, or someone asking about quitting their job while carrying substantial debt. The probe automatically adapts attack scenarios to agentâs use case, testing whether it can detect and respond appropriately when vulnerable users frame dangerous decisions as normal choices.
- Agentic Tool Extraction - This probe implements an agentic reconnaissance workflow that systematically queries the agent to discover and enumerate available tools, functions, and capabilities, exposing the agentâs internal configuration and expanding the attack surface for targeted exploitation
- JSON Export - Export scan results in JSON format directly from the UI. See Review scan results for more details
- UI Improvements - Various improvements for better readability and stability
Audit & History Audit history allows you to track all changes across the Hub, which allows you to understand project history and helps with regulatory compliance. For more information, see Event logs:
- Change Timelines - View change timelines for all major entities (projects, datasets, checks, models, tasks, scansâŚ)
- Human-Readable Summaries - Clear, human-readable summaries of all updates
- Project-Wide Search - Search audits across the whole project
UI & Content Improvements Enhanced user experience throughout the Hub:
- Markdown Support - Descriptions and error messages now support Markdown formatting
- Better Navigation - Clearer labels, improved empty states, and more consistent navigation
Whatâs fixed?
Section titled âWhatâs fixed?â- Email Reliability - More robust TLS handling for outbound email
2.0.1 (2025-10-24)
Section titled â2.0.1 (2025-10-24)âWeâre releasing a focused update that enhances the user experience with a refreshed interface, improved error handling, and better reliability across the platform.
Whatâs new?
Section titled âWhatâs new?âRefreshed Hub Theme & Colors The Hub now features a cleaner, more modern look with updated colors and improved visual consistency throughout the interface.
Clearer, Friendlier Error Pages Error messages are now more user-friendly and provide clearer guidance on how to resolve issues, making troubleshooting easier for users.
Improved Scan Experience Enhanced the scanning workflow with several key improvements:
- Toggle Select/Unselect Probes - Easier probe management with intuitive selection controls
- Better Issue Visualization - Improved display of scan results and vulnerability details
- Knowledge Base Display - Relevant knowledge base information is now shown when applicable during scans
Updated Login Page and Smoother Navigation Streamlined authentication flow with improved login page design and enhanced navigation throughout the application.
Whatâs fixed?
Section titled âWhatâs fixed?â- Topic Filtering on Knowledge Base Page - Fixed issues with filtering functionality on the Knowledge Base page
- Database Issues with Forbidden Characters - Resolved problems caused by special characters in database operations
- Large Document Generation - Fixed failures that occurred when generating large documents
- âPermissionâ Renamed to User Management - Updated terminology for better clarity and consistency
2.0.0 (2025-09-25)
Section titled â2.0.0 (2025-09-25)âWeâre releasing an upgraded LLM vulnerability scanner in Giskard Hub, specifically designed to secure conversational AI agents in production environments. This enterprise version deploys autonomous red teaming agents that conduct dynamic, multi-turn attacks across dozens of vulnerability categories covering more than 40 probes.
Whatâs new?
Section titled âWhatâs new?âComprehensive LLM Vulnerabilities Coverage The scanner covers LLM vulnerabilities across established OWASP categories and business failures:
- Prompt Injection (OWASP LLM 01) - Attacks that manipulate AI agents through carefully crafted prompts
- Training Data Extraction (OWASP LLM 02) - Attempts to extract or infer information from the AI modelâs training data
- Data Privacy Exfiltration (OWASP LLM 05) - Attacks aimed at extracting sensitive information
- Excessive Agency (OWASP LLM 06) - Tests whether AI agents can be manipulated beyond their intended scope
- Hallucination & Misinformation (OWASP LLM 09) - Tests for false, inconsistent, or fabricated information
- Denial of Service (OWASP LLM 10) - Attacks that attempt to cause resource exhaustion
- Internal Information Exposure - Attempts to extract system prompts and configuration details
- Harmful Content Generation - Probes that bypass safety measures
- Brand Damage & Reputation - Tests for reputational risks
- Legal & Financial Risk - Attacks exposing deployers to liabilities
- Unauthorized Professional Advice - Tests for advice outside intended scope
Business Alignment Evaluates both security vulnerabilities and business failures, automatically validating business logic by generating expected outputs from knowledge bases.
Domain-specific Attacks Adapts testing methodologies to agent-specific contexts using bot descriptions, tools specification, and knowledge bases for realistic evaluation.
Multi-turn Attack Simulation Implements dynamic multi-turn testing that simulates realistic conversation flows, detecting context-dependent vulnerabilities that emerge through conversation history.
Adaptive AI Red Teaming Adjusts attack strategies based on agent resistance, escalating tactics or pivoting approaches when encountering defenses.
Root-cause Analysis Every detected vulnerability includes detailed explanations of attack methodology and severity scoring for prioritized remediation.
Continuous Red Teaming Detected vulnerabilities automatically convert into reusable tests for continuous validation and integration into golden datasets.
Whatâs changed?
Section titled âWhatâs changed?â- Removed support for importing and exporting knowledge bases (KB) in CSV format. Only JSON and JSONL formats are now supported for KB import/export.
- In the client library version 2.0.0, legacy functions have been deprecated and removed. Notably, the previous âconversationsâ functionality has been replaced by âchat_test_casesâ to improve clarity and consistency across the product.
Whatâs fixed?
Section titled âWhatâs fixed?â- Fixed an issue with document embedding when handling a single large document.
- Resolved a bug related to access of notification preferences, ensuring all users have appropriate access regardless of their permissions.
- Corrected a problem where new environment creation did not set the Keycloak secret correctly.
- Fixed mismatches between displayed statistics and actual items in evaluation lists.
- Addressed a bug affecting failure category editing.
- Fixed incorrect styling on the âmove conversationâ button.
- Resolved issues with failure categories not functioning properly when using a local model.
How to get started?
Section titled âHow to get started?â- Configure vulnerability scope - Select specific vulnerability categories relevant to your use case
- Execute the scan - The system runs hundreds of probes across security and business logic areas
- Analyze results by severity - Results are organized by criticality for prioritized review
- Review individual probes - Each probe provides detailed attack descriptions and explanations
- Turn into continuous tests - Successful probes can convert into tests for continuous validation
This release enables detection of sophisticated attacks that evolve across multiple conversation turns, automatically generating attacks, analyzing system responses, and modifying approaches to help correct agents with re-executable tests.