Open In Colab View Notebook on GitHub

RAG Evaluation Toolkit on an IPCC Climate Agent

Install dependencies and download the IPCC report

[ ]:
!pip install "giskard[llm]" --upgrade
!pip install llama-index PyMuPDF
[ ]:
!wget "https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_LongerReport.pdf" -O "ipcc_report.pdf"

The testset generation can take up to several minutes. If you want to test RAGET on the IPCC report, we prepared a demo testset along with the evaluation report of the RAG model build in this notebook. These can be downloaded with the following command:

[ ]:
!git clone https://github.com/Giskard-AI/raget_demo.git

Build RAG Agent on the IPCC report

[1]:
import pandas as pd
import warnings
pd.set_option("display.max_colwidth", 400)
warnings.filterwarnings('ignore')
[2]:
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.readers.file import PyMuPDFReader
from llama_index.core.base.llms.types import ChatMessage, MessageRole

loader = PyMuPDFReader()
ipcc_documents = loader.load(file_path="./ipcc_report.pdf")
[3]:
splitter = SentenceSplitter(chunk_size=512)
index = VectorStoreIndex.from_documents(ipcc_documents, transformations=[splitter])
chat_engine = index.as_chat_engine()

drawing

Let’s test the Agent

[4]:
str(chat_engine.chat("How much will the global temperature rise by 2100?"))
[4]:
'The projected global temperature increase by the year 2100 is 3.2 degrees Celsius, with a range of 2.2 to 3.5 degrees Celsius.'

Generate a test set on the IPCC report

[5]:
from giskard.rag import KnowledgeBase, generate_testset, QATestset

text_nodes = splitter(ipcc_documents)
knowledge_base_df = pd.DataFrame([node.text for node in text_nodes], columns=["text"])
knowledge_base = KnowledgeBase(knowledge_base_df)
[ ]:
testset = generate_testset(knowledge_base,
                           num_questions=120,
                           agent_description="A chatbot answering questions about the IPCC report")
[6]:
testset = QATestset.load("ipcc_testset.jsonl")
[ ]:
testset.to_pandas().head(5)
question reference_answer reference_context conversation_history metadata
id
450623f7-e644-4bfa-88d5-90f31dd15d99 What are the consequences of global warming exceeding 2°C for climate resilient development in some regions and sub-regions? Climate resilient development will not be possible in some regions and sub-regions if global warming exceeds 2°C. Document 196: Accelerated and equitable mitigation and adaptation bring benefits from avoiding damages from climate \nchange and are critical to achieving sustainable development (high confidence). Climate resilient development138 \npathways are progressively constrained by every increment of further warming (very high confidence). There is a \nrapidly closing window of opportunity to secure a li... [] {'question_type': 'simple', 'seed_document_id': 196, 'topic': 'Climate Change Action'}
79f98d3d-766b-4cbf-800f-03e87966e3e5 What is the projected decline in coral reefs with a global warming of 1.5°C? Coral reefs are projected to decline by a further 70–90% at 1.5°C of global warming. Document 123: 71\nLong-Term Climate and Development Futures\nSection 3\n3.1.2 Impacts and Related Risks\nFor a given level of warming, many climate-related risks are \nassessed to be higher than in AR5 (high confidence). Levels of \nrisk120 for all Reasons for Concern121 (RFCs) are assessed to become high \nto very high at lower global warming levels compared to what was \nassessed in AR5 (high... [] {'question_type': 'simple', 'seed_document_id': 123, 'topic': 'Climate Change Risks'}
1ee224a2-62af-4877-b172-baec006512e6 What is the expected uncertainty range in the total potential for mitigation options according to the IPCC report? The uncertainty in the total potential is typically 25–50%. Document 251: Where a gradual colour transition is shown, the breakdown of the potential into cost categories is not well known or depends heavily on factors such \nas geographical location, resource availability, and regional circumstances, and the colours indicate the range of estimates. The uncertainty in the total potential is typically 25–50%. \nWhen interpreting this figure, the following... [] {'question_type': 'simple', 'seed_document_id': 251, 'topic': 'Climate Change Action'}
16264bd2-510a-4368-a9d6-0a5fef7feb65 What is the effect of increasing cumulative net CO2 emissions on the effectiveness of natural land and ocean carbon sinks? The proportion of emissions taken up by land and ocean decreases with increasing cumulative net CO2 emissions. Document 166: While \nnatural land and ocean carbon sinks are projected to take up, in absolute \nterms, a progressively larger amount of CO2 under higher compared to \nlower CO2 emissions scenarios, they become less effective, that is, the \nproportion of emissions taken up by land and ocean decreases with \nincreasing cumulative net CO2 emissions (high confidence). Additional \necosystem resp... [] {'question_type': 'simple', 'seed_document_id': 166, 'topic': 'Climate Change Projections'}
c31c6857-c505-45ef-98e5-aa524c4b05e7 What does hatching represent on the maps depicting changes in maize yield and fisheries catch potential? Hatching indicates areas where less than 70% of the climate-crop model combinations agree on the sign of impact for maize yield, and where the two climate-fisheries models disagree in the direction of change for fisheries catch potential. Document 135: Interquartile ranges of WGLs by 2081–2100 \nunder RCP2.6, RCP4.5 and RCP8.5. The presented index is consistent with common features found in many indices included within WGI and WGII assessments. (c) Impacts \non food production: (c1) Changes in maize yield at projected GWLs of 1.6°C to 2.4°C (2.0°C), 3.3°C to 4.8°C (4.1°C) and 3.9°C to 6.0°C (4.9°C). Median yield changes \nfrom ... [] {'question_type': 'simple', 'seed_document_id': 135, 'topic': 'Climate Change Assessment'}

Evaluate and Diagnose the Agent

[8]:
from giskard.rag import evaluate, RAGReport
from giskard.rag.metrics.ragas_metrics import ragas_context_recall, ragas_context_precision
[9]:
def answer_fn(question, history=None):
    if history:
        answer = chat_engine.chat(question, chat_history=[ChatMessage(role=MessageRole.USER if msg["role"] =="user" else MessageRole.ASSISTANT,
                                                          content=msg["content"]) for msg in history])
    else:
        answer = chat_engine.chat(question, chat_history=[])
    return str(answer)

report = evaluate(answer_fn,
                testset=testset,
                knowledge_base=knowledge_base,
                metrics=[ragas_context_recall, ragas_context_precision])
[9]:
report = RAGReport.load("ipcc_report")
[10]:
display(report.to_html(embed=True))
[11]:
report.save("ipcc_report")

RAGET question types

Each question type assesses a few RAG components. This makes it possible to localize weaknesses in the RAG Agent and give feedback to the developers.

Question type

Description

Example

Targeted RAG components

Simple

Simple questions generated from an excerpt of the knowledge base

How much will the global temperature rise by 2100?

Generator, Retriever

Complex

Questions made more complex by paraphrasing

How much will the temperature rise in a century?

Generator

Distracting

Questions made to confuse the retrieval part of the RAG with a distracting element from the knowledge base but irrelevant to the question

Renewable energy are cheaper but how much will the global temperature rise by 2100?

Generator, Retriever, Rewriter

Situational

Questions including user context to evaluate the ability of the generation to produce relevant answer according to the context

I want to take personal actions to reduce my carbon footprint and I wonder how much will the global temperature rise by 2100?

Generator

Double

Questions with two distinct parts to evaluate the capabilities of the query rewriter of the RAG

How much will the global temperature rise by 2100 and what is the main source of Greenhouse Gases?

Generator, Rewriter

Conversational

Questions made as part of a conversation, first message describe the context of the question that is ask in the last message, also tests the rewriter

  • I want to know more about the global temperature evolution by 2100. - How high will it be?

Rewriter, Routing