Open In Colab View Notebook on GitHub

RAG Evaluation Toolkit on an IPCC Climate Agentยถ

Install dependencies and download the IPCC reportยถ

[ ]:
!pip install "giskard[llm]" --upgrade
!pip install llama-index PyMuPDF
[ ]:
!wget "https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_LongerReport.pdf" -O "ipcc_report.pdf"

Build RAG Agent on the IPCC reportยถ

[1]:
import pandas as pd
import warnings
pd.set_option("display.max_colwidth", 400)
warnings.filterwarnings('ignore')
[2]:
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.readers.file import PyMuPDFReader
from llama_index.core.base.llms.types import ChatMessage, MessageRole

loader = PyMuPDFReader()
ipcc_documents = loader.load(file_path="./ipcc_report.pdf")
[3]:
splitter = SentenceSplitter(chunk_size=512)
index = VectorStoreIndex.from_documents(ipcc_documents, transformations=[splitter])
chat_engine = index.as_chat_engine()

drawing

Letโ€™s test the Agentยถ

[4]:
str(chat_engine.chat("How much will the global temperature rise by 2100?"))
[4]:
'The predicted global temperature rise by 2100 is 3.2 degrees Celsius, with a range of 2.2 to 3.5 degrees Celsius.'

Generate a test set on the IPCC reportยถ

[ ]:
from giskard.rag import KnowledgeBase, generate_testset, QATestset

text_nodes = splitter(ipcc_documents)
knowledge_base_df = pd.DataFrame([node.text for node in text_nodes], columns=["text"])
knowledge_base = KnowledgeBase(knowledge_base_df)
[ ]:
testset = generate_testset(knowledge_base,
                           num_questions=120,
                           agent_description="A chatbot answering questions about the IPCC report")
[7]:
# Save the testset
testset.save("ipcc_testset.jsonl")

# Load the testset
testset = QATestset.load("ipcc_testset.jsonl")
[8]:
testset.to_pandas().head(5)
[8]:
question reference_answer reference_context conversation_history metadata
id
1cacb231-b6e3-44aa-a315-79aa43cff369 When is the best estimate of reaching 1.5ยฐC of global warming according to most scenarios? The best estimate of reaching 1.5ยฐC of global warming lies in the first half of the 2030s in most of the considered scenarios and modelled pathways. Document 116: The best estimate of reaching 1.5ยฐC of global \nwarming lies in the ๏ฌrst half of the 2030s in most of the considered \nscenarios and modelled pathways114. In the very low GHG emissions \nscenario (SSP1-1.9), CO2 emissions reach net zero around 2050 and the \nbest-estimate end-of-century warming is 1.4ยฐC, after a temporary overshoot \n(see Section 3.3.4) of no more than 0.1ยฐC abov... [] {'question_type': 'simple', 'seed_document_id': 116, 'topic': 'Climate Change Mitigation Scenarios'}
d785c257-4c44-443c-99dd-7d72a296da9f What are the projected global emissions for 2030 based on policies implemented by the end of 2020? The median projected global emissions for 2030 based on policies implemented by the end of 2020 are 57 GtCO2-eq/yr, with a range of 52โ€“60 GtCO2-eq/yr. Document 82: Emissions projections for 2030 and gross differences in emissions are based on emissions of 52โ€“56 GtCO2-eq yrโ€“1 in 2019 as assumed in underlying model \nstudies97. (medium con๏ฌdence) {WGIII Table SPM.1} (Table 3.1, Cross-Section Box.2) \n95 \nAbatement here refers to human interventions that reduce the amount of GHGs that are released from fossil fuel infrastructure to the atmosph... [] {'question_type': 'simple', 'seed_document_id': 82, 'topic': 'Global Greenhouse Gas Emissions and Climate Policy'}
0646700a-9617-4dad-9a12-f84a6048ca9d What are some key barriers to the implementation of adaptation options in vulnerable sectors? Key barriers include limited resources, lack of private-sector and civic engagement, insufficient mobilisation of finance, lack of political commitment, limited research and/or slow and low uptake of adaptation science, and a low sense of urgency. Document 95: 62\nSection 2\nSection 1\nSection 2\n๏ฌre-adapted ecosystems, or hard defences against ๏ฌ‚ooding) and human \nsettlements (e.g. stranded assets and vulnerable communities that \ncannot afford to shift away or adapt and require an increase in social \nsafety nets). Maladaptation especially affects marginalised and vulnerable \ngroups adversely (e.g., Indigenous Peoples, ethnic minorit... [] {'question_type': 'simple', 'seed_document_id': 95, 'topic': 'Others'}
0d78955c-f9c8-41ad-9ba4-a2670da4e63c What are some irreversible changes projected due to continued GHG emissions? Many changes due to past and future GHG emissions are irreversible on centennial to millennial time scales, especially in the ocean, ice sheets, and global sea level. Document 118: {WGI SPM D.1.7, WGI Box TS.7} (Cross-Section Box.2)\nContinued GHG emissions will further affect all major climate \nsystem components, and many changes will be irreversible on \ncentennial to millennial time scales. Many changes in the climate \nsystem become larger in direct relation to increasing global warming. \nWith every additional increment of global warming, changes in \... [] {'question_type': 'simple', 'seed_document_id': 118, 'topic': 'Others'}
00d34731-7f09-446d-80fb-40c0b20b547a What are some options for scaling up mitigation and adaptation in developing regions according to the context? Options include increased levels of public finance and publicly mobilised private finance flows from developed to developing countries, increasing the use of public guarantees to reduce risks and leverage private flows at lower cost, local capital markets development, and building greater trust in international cooperation processes. Document 291: Accelerated support \nfrom developed countries and multilateral institutions is a critical \nenabler to enhance mitigation and adaptation action and can address \ninequities in ๏ฌnance, including its costs, terms and conditions, and \neconomic vulnerability to climate change. Scaled-up public grants for \nmitigation and adaptation funding for vulnerable regions, e.g., in Sub-\nSah... [] {'question_type': 'simple', 'seed_document_id': 291, 'topic': 'Climate Change Mitigation and Adaptation'}

Evaluate and Diagnose the Agentยถ

[9]:
from giskard.rag import evaluate, RAGReport, AgentAnswer
from giskard.rag.metrics.ragas_metrics import ragas_context_recall, ragas_context_precision
[ ]:
def answer_fn(question, history=None):
    if history:
        answer = chat_engine.chat(
            question,
            chat_history=[
                ChatMessage(
                    role=MessageRole.USER if msg["role"] == "user" else MessageRole.ASSISTANT,
                    content=msg["content"]
                ) for msg in history
            ]
        )
    else:
        answer = chat_engine.chat(question, chat_history=[])

    return AgentAnswer(
        message=answer.response,
        documents=[source.content for source in answer.sources]
    )

report = evaluate(answer_fn,
                testset=testset,
                knowledge_base=knowledge_base,
                metrics=[ragas_context_recall, ragas_context_precision])
[11]:
# Save the RAG report
report.save("ipcc_report")

# Load the RAG report
report = RAGReport.load("ipcc_report")
[12]:
display(report.to_html(embed=True))

RAGET question typesยถ

Each question type assesses a few RAG components. This makes it possible to localize weaknesses in the RAG Agent and give feedback to the developers.

Question type

Description

Example

Targeted RAG components

Simple

Simple questions generated from an excerpt of the knowledge base

How much will the global temperature rise by 2100?

Generator, Retriever

Complex

Questions made more complex by paraphrasing

How much will the temperature rise in a century?

Generator

Distracting

Questions made to confuse the retrieval part of the RAG with a distracting element from the knowledge base but irrelevant to the question

Renewable energy are cheaper but how much will the global temperature rise by 2100?

Generator, Retriever, Rewriter

Situational

Questions including user context to evaluate the ability of the generation to produce relevant answer according to the context

I want to take personal actions to reduce my carbon footprint and I wonder how much will the global temperature rise by 2100?

Generator

Double

Questions with two distinct parts to evaluate the capabilities of the query rewriter of the RAG

How much will the global temperature rise by 2100 and what is the main source of Greenhouse Gases?

Generator, Rewriter

Conversational

Questions made as part of a conversation, first message describe the context of the question that is ask in the last message, also tests the rewriter

  • I want to know more about the global temperature evolution by 2100. - How high will it be?

Rewriter, Routing