RAG Evaluation Toolkit on an IPCC Climate Agent¶
Install dependencies and download the IPCC report¶
[ ]:
!pip install "giskard[llm]" --upgrade
!pip install llama-index PyMuPDF
[ ]:
!wget "https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_LongerReport.pdf" -O "ipcc_report.pdf"
The testset generation can take up to several minutes. If you want to test RAGET on the IPCC report, we prepared a demo testset along with the evaluation report of the RAG model build in this notebook. These can be downloaded with the following command:
[ ]:
!git clone https://github.com/Giskard-AI/raget_demo.git
Build RAG Agent on the IPCC report¶
[1]:
import pandas as pd
import warnings
pd.set_option("display.max_colwidth", 400)
warnings.filterwarnings('ignore')
[2]:
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.readers.file import PyMuPDFReader
from llama_index.core.base.llms.types import ChatMessage, MessageRole
loader = PyMuPDFReader()
ipcc_documents = loader.load(file_path="./ipcc_report.pdf")
[3]:
splitter = SentenceSplitter(chunk_size=512)
index = VectorStoreIndex.from_documents(ipcc_documents, transformations=[splitter])
chat_engine = index.as_chat_engine()
Let’s test the Agent¶
[4]:
str(chat_engine.chat("How much will the global temperature rise by 2100?"))
[4]:
'The projected global temperature increase by the year 2100 is 3.2 degrees Celsius, with a range of 2.2 to 3.5 degrees Celsius.'
Generate a test set on the IPCC report¶
[5]:
from giskard.rag import KnowledgeBase, generate_testset, QATestset
text_nodes = splitter(ipcc_documents)
knowledge_base_df = pd.DataFrame([node.text for node in text_nodes], columns=["text"])
knowledge_base = KnowledgeBase(knowledge_base_df)
[ ]:
testset = generate_testset(knowledge_base,
num_questions=120,
agent_description="A chatbot answering questions about the IPCC report")
[6]:
testset = QATestset.load("ipcc_testset.jsonl")
[ ]:
testset.to_pandas().head(5)
question | reference_answer | reference_context | conversation_history | metadata | |
---|---|---|---|---|---|
id | |||||
450623f7-e644-4bfa-88d5-90f31dd15d99 | What are the consequences of global warming exceeding 2°C for climate resilient development in some regions and sub-regions? | Climate resilient development will not be possible in some regions and sub-regions if global warming exceeds 2°C. | Document 196: Accelerated and equitable mitigation and adaptation bring benefits from avoiding damages from climate \nchange and are critical to achieving sustainable development (high confidence). Climate resilient development138 \npathways are progressively constrained by every increment of further warming (very high confidence). There is a \nrapidly closing window of opportunity to secure a li... | [] | {'question_type': 'simple', 'seed_document_id': 196, 'topic': 'Climate Change Action'} |
79f98d3d-766b-4cbf-800f-03e87966e3e5 | What is the projected decline in coral reefs with a global warming of 1.5°C? | Coral reefs are projected to decline by a further 70–90% at 1.5°C of global warming. | Document 123: 71\nLong-Term Climate and Development Futures\nSection 3\n3.1.2 Impacts and Related Risks\nFor a given level of warming, many climate-related risks are \nassessed to be higher than in AR5 (high confidence). Levels of \nrisk120 for all Reasons for Concern121 (RFCs) are assessed to become high \nto very high at lower global warming levels compared to what was \nassessed in AR5 (high... | [] | {'question_type': 'simple', 'seed_document_id': 123, 'topic': 'Climate Change Risks'} |
1ee224a2-62af-4877-b172-baec006512e6 | What is the expected uncertainty range in the total potential for mitigation options according to the IPCC report? | The uncertainty in the total potential is typically 25–50%. | Document 251: Where a gradual colour transition is shown, the breakdown of the potential into cost categories is not well known or depends heavily on factors such \nas geographical location, resource availability, and regional circumstances, and the colours indicate the range of estimates. The uncertainty in the total potential is typically 25–50%. \nWhen interpreting this figure, the following... | [] | {'question_type': 'simple', 'seed_document_id': 251, 'topic': 'Climate Change Action'} |
16264bd2-510a-4368-a9d6-0a5fef7feb65 | What is the effect of increasing cumulative net CO2 emissions on the effectiveness of natural land and ocean carbon sinks? | The proportion of emissions taken up by land and ocean decreases with increasing cumulative net CO2 emissions. | Document 166: While \nnatural land and ocean carbon sinks are projected to take up, in absolute \nterms, a progressively larger amount of CO2 under higher compared to \nlower CO2 emissions scenarios, they become less effective, that is, the \nproportion of emissions taken up by land and ocean decreases with \nincreasing cumulative net CO2 emissions (high confidence). Additional \necosystem resp... | [] | {'question_type': 'simple', 'seed_document_id': 166, 'topic': 'Climate Change Projections'} |
c31c6857-c505-45ef-98e5-aa524c4b05e7 | What does hatching represent on the maps depicting changes in maize yield and fisheries catch potential? | Hatching indicates areas where less than 70% of the climate-crop model combinations agree on the sign of impact for maize yield, and where the two climate-fisheries models disagree in the direction of change for fisheries catch potential. | Document 135: Interquartile ranges of WGLs by 2081–2100 \nunder RCP2.6, RCP4.5 and RCP8.5. The presented index is consistent with common features found in many indices included within WGI and WGII assessments. (c) Impacts \non food production: (c1) Changes in maize yield at projected GWLs of 1.6°C to 2.4°C (2.0°C), 3.3°C to 4.8°C (4.1°C) and 3.9°C to 6.0°C (4.9°C). Median yield changes \nfrom ... | [] | {'question_type': 'simple', 'seed_document_id': 135, 'topic': 'Climate Change Assessment'} |
Evaluate and Diagnose the Agent¶
[8]:
from giskard.rag import evaluate, RAGReport
from giskard.rag.metrics.ragas_metrics import ragas_context_recall, ragas_context_precision
[9]:
def answer_fn(question, history=None):
if history:
answer = chat_engine.chat(question, chat_history=[ChatMessage(role=MessageRole.USER if msg["role"] =="user" else MessageRole.ASSISTANT,
content=msg["content"]) for msg in history])
else:
answer = chat_engine.chat(question, chat_history=[])
return str(answer)
report = evaluate(answer_fn,
testset=testset,
knowledge_base=knowledge_base,
metrics=[ragas_context_recall, ragas_context_precision])
[9]:
report = RAGReport.load("ipcc_report")
[10]:
display(report.to_html(embed=True))
[11]:
report.save("ipcc_report")
RAGET question types¶
Each question type assesses a few RAG components. This makes it possible to localize weaknesses in the RAG Agent and give feedback to the developers.
Question type |
Description |
Example |
Targeted RAG components |
---|---|---|---|
Simple |
Simple questions generated from an excerpt of the knowledge base |
How much will the global temperature rise by 2100? |
|
Complex |
Questions made more complex by paraphrasing |
How much will the temperature rise in a century? |
|
Distracting |
Questions made to confuse the retrieval part of the RAG with a distracting element from the knowledge base but irrelevant to the question |
Renewable energy are cheaper but how much will the global temperature rise by 2100? |
|
Situational |
Questions including user context to evaluate the ability of the generation to produce relevant answer according to the context |
I want to take personal actions to reduce my carbon footprint and I wonder how much will the global temperature rise by 2100? |
|
Double |
Questions with two distinct parts to evaluate the capabilities of the query rewriter of the RAG |
How much will the global temperature rise by 2100 and what is the main source of Greenhouse Gases? |
|
Conversational |
Questions made as part of a conversation, first message describe the context of the question that is ask in the last message, also tests the rewriter |
|
|