Open In Colab View Notebook on GitHub

RAG Evaluation Toolkit on a Banking Supervisory Process AgentΒΆ

Before startingΒΆ

In this notebook, we will build a Banking Supervisory Process Agent using llama_index and gpt-3.5-turbo model. Then, we use giskard to evalute the model itself and also the RAG system.

To perform these evaluations, we use features such as scan, generate_testset and evaluate, which require an LLM client. By default, the client is set to use OpenAI’s models (e.g. gpt-4 and text-embedding-ada-002). If you want to use another provider (e.g. Ollama, Gemini, Azure, etc.) or change the models, please refer to Setting up the LLM Client for more information.

Install dependencies and download the Banking Supervision reportΒΆ

[ ]:
!pip install "giskard[llm]" --upgrade
!pip install llama-index PyMuPDF
[ ]:
!wget "https://www.bankingsupervision.europa.eu/ecb/pub/pdf/ssm.supervisory_guides202401_manual.en.pdf" -O "banking_supervision_report.pdf"

Build RAG Agent on the Banking Supervision reportΒΆ

[22]:
import pandas as pd
import warnings
pd.set_option("display.max_colwidth", 400)
warnings.filterwarnings('ignore')
[35]:
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.readers.file import PyMuPDFReader
from llama_index.core.base.llms.types import ChatMessage, MessageRole
from llama_index.llms.openai import OpenAI

loader = PyMuPDFReader()
documents = loader.load(file_path="./banking_supervision_report.pdf")
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
[36]:
splitter = SentenceSplitter(chunk_size=512)
index = VectorStoreIndex.from_documents(documents, transformations=[splitter])
chat_engine = index.as_chat_engine(llm=llm)

Let’s test the AgentΒΆ

[51]:
str(chat_engine.chat("What is SSM?"))
[51]:
'SSM stands for Single Supervisory Mechanism.'

Scan LLM vulnerabilitiesΒΆ

As a first step, we will run a scan on the chatbot model. This will help us identify the potential vulnerabilities in the model that the agent is built on.

[38]:
def model_predict(df: pd.DataFrame):
    return [chat_engine.chat(question).response for question in df["question"]]
[ ]:
from giskard import Model

giskard_model = Model(
    model=model_predict,
    model_type="text_generation",
    name="Banking Supervision Question Answering",
    description="A model that answers questions about ECB Banking Supervision report",
    feature_names=["question"]
)
[ ]:
from giskard import scan

scan_report = scan(giskard_model)
[ ]:
# Scan report
display(scan_report)

Generate a test set on the Banking Supervision reportΒΆ

[41]:
from giskard.rag import KnowledgeBase, generate_testset, QATestset

text_nodes = splitter(documents)
knowledge_base_df = pd.DataFrame([node.text for node in text_nodes], columns=["text"])
knowledge_base = KnowledgeBase(knowledge_base_df)
[ ]:
testset = generate_testset(knowledge_base,
                           num_questions=100,
                           agent_description="A chatbot answering questions about banking supervision procedures and methodologies.",
                           language="en")
[43]:
# Save the testset
testset.save("banking_supervision_testset.jsonl")

# Load the testset
testset = QATestset.load("banking_supervision_testset.jsonl")
[44]:
testset.to_pandas().head(5)
[44]:
question reference_answer reference_context conversation_history metadata
id
1f427c44-3a77-4020-ac1b-3d37f3ac9fc7 What are the conditions under which the ECB or NCA may initiate a withdrawal of a credit institution's authorisation? The ECB or the NCA may initiate a withdrawal of a credit institution's authorisation if the institution can no longer meet prudential requirements, cannot be relied on to fulfil its obligations towards its creditors, or commits a severe breach of applicable AML/CFT obligations. Document 89: Supervisory Manual – Supervision of all supervised entities \n \n53 \nFigure 11 \nOverview of withdrawal process \n \n \n \nIf the supervised entity has requested the withdrawal of its authorisation, for example \nbecause it no longer conducts any banking business, the NCA and the ECB assess \nwhether the applicable preconditions for the withdrawal of authorisation according to \n... [] {'question_type': 'simple', 'seed_document_id': 89, 'topic': 'ECB Banking Supervision'}
dcb57559-b028-4f1e-aba3-711b2e393d44 What is the role of the coordinator in the supplementary supervision of financial conglomerates? The coordinator is responsible for coordinating and carrying out the supplementary supervision of the regulated entities in a financial conglomerate, ensuring appropriate and regular stress testing while avoiding duplication or substitution of sectoral supervision. Document 53: The coordinator is responsible for coordinating and carrying out the supplementary \nsupervision of the regulated entities in a financial conglomerate. In cooperation with \nother relevant competent authorities, the coordinator ensures appropriate and \nregular stress testing of financial conglomerates, while avoiding duplication or \nsubstitution of the sectoral supervision. \nTh... [] {'question_type': 'simple', 'seed_document_id': 53, 'topic': 'Single Supervisory Mechanism'}
2fc3f180-be37-4a86-b8e7-a8b5ede76e92 What actions can the Investigation Unit (IU) take if a suspected breach is referred to them? The IU can initiate an investigation by exercising the powers granted to the ECB by the SSM Regulation, which include requesting documents, examining relevant books and records, requesting explanations, holding interviews, and conducting on-site inspections. The IU may also request the NCAs to use their investigatory powers under applicable national law. Document 185: Supervisory Manual – Supervision of significant institutions \n \n107 \nIn cases where sanctions need to be taken at the national level (i.e. breaches of \nnational law implementing EU directives, breaches committed by natural persons, or \nnon-pecuniary penalties), the IU prepares a proposal for a complete draft ECB \ndecision requesting that the relevant NCA open proceedings. \... [] {'question_type': 'simple', 'seed_document_id': 185, 'topic': 'European Banking Supervision'}
8da35bb5-2fde-47b2-8a93-d3c9f4f24d91 What is the role of the ECB in the crisis management of less significant institutions (LSIs)? In the crisis management of LSIs, the ECB's role involves taking decisions on common procedures, such as the withdrawal of banking authorisations. The ECB may also increase its oversight activities, directly collect data, participate in or conduct on-site inspections, and potentially become involved in direct supervision to ensure high supervisory standards or if an LSI's status changes to a s... Document 198: Supervisory Manual – Supervision of less significant institutions \n \n115 \n5.1.4 \nThe role of the ECB in crisis management \nIn accordance with the SSM Framework Regulation, the responsibility for managing \nLSIs during a crisis lies with the NCAs and other relevant authorities at the national \nlevel. That said, when managing any crisis situation involving an LSI, the informa... [] {'question_type': 'simple', 'seed_document_id': 198, 'topic': 'European Banking Supervision'}
c0b9bade-f110-4fe1-9927-52a7437bffeb What happens if a quorum of 50% in the Supervisory Board for emergency situations is not met? If a quorum of 50% in the Supervisory Board for emergency situations is not met, the meeting will be closed and an extraordinary meeting will be held soon afterwards. The invitation letter to the extraordinary meeting should announce that the decisions will be taken without regard for the quorum. Document 38: Supervisory Manual – Functioning of the Single Supervisory Mechanism \n \n24 \nβ€’ \nif an NCA which is concerned by the decision has different views regarding the \nobjection, the NCA may request mediation; \nβ€’ \nif no request for mediation is submitted, the Supervisory Board may amend the \ndraft decision in order to incorporate the comments of the Governing Council; \nβ€’ \nif the ... [] {'question_type': 'simple', 'seed_document_id': 38, 'topic': 'ECB Banking Supervision'}

Evaluate and Diagnose the AgentΒΆ

Now, we focus on evaluating the agent’s performance on the Banking Supervision report. We will use the RAG evaluation toolkit (RAGET) to evaluate the agent’s performance and diagnose the issues.

[45]:
from giskard.rag import evaluate, RAGReport
from giskard.rag.metrics.ragas_metrics import ragas_context_recall, ragas_context_precision
[ ]:
def answer_fn(question, history=None):
    if history:
        answer = chat_engine.chat(question, chat_history=[ChatMessage(role=MessageRole.USER if msg["role"] =="user" else MessageRole.ASSISTANT,
                                                          content=msg["content"]) for msg in history])
    else:
        answer = chat_engine.chat(question, chat_history=[])
    return str(answer)

rag_report = evaluate(answer_fn,
                testset=testset,
                knowledge_base=knowledge_base,
                metrics=[ragas_context_recall, ragas_context_precision])
[ ]:
# Save the RAG report
rag_report.save("banking_supervision_report")

# Load the RAG report
rag_report = RAGReport.load("banking_supervision_report")
[ ]:
# RAG report
display(rag_report.to_html(embed=True))

RAGET question typesΒΆ

Each question type assesses a few RAG components. This makes it possible to localize weaknesses in the RAG Agent and give feedback to the developers.

Question type

Description

Example

Targeted RAG components

Simple

Simple questions generated from an excerpt of the knowledge base

What is the purpose of the holistic approach in the SREP?

Generator, Retriever

Complex

Questions made more complex by paraphrasing

In what capacity and with what frequency do NCAs contribute to the formulation and scheduling of supervisory activities, especially concerning the organization of on-site missions?

Generator

Distracting

Questions made to confuse the retrieval part of the RAG with a distracting element from the knowledge base but irrelevant to the question

Under what conditions does the ECB levy fees to cover the costs of its supervisory tasks, particularly in the context of financial conglomerates requiring cross-sector supervision?

Generator, Retriever, Rewriter

Situational

Questions including user context to evaluate the ability of the generation to produce relevant answer according to the context

As a bank manager looking to understand the appeal process for a regulatory decision made by the ECB, could you explain what role the ABoR plays in the supervisory decision review process?

Generator

Double

Questions with two distinct parts to evaluate the capabilities of the query rewriter of the RAG

What role does the SSM Secretariat Division play in the decision-making process of the ECB’s supervisory tasks, and which directorates general are involved in the preparation of draft decisions for supervised entities in the ECB Banking Supervision?

Generator, Rewriter

Conversational

Questions made as part of a conversation, first message describe the context of the question that is ask in the last message, also tests the rewriter

  • I am interested in the sources used for the assessment of risks and vulnerabilities in ECB Banking Supervision. -

What are these sources?

Rewriter, Routing