RAG Evaluation Toolkit on an IPCC Climate Agentยถ
Install dependencies and download the IPCC reportยถ
[ ]:
!pip install "giskard[llm]" --upgrade
!pip install llama-index PyMuPDF
[ ]:
!wget "https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_LongerReport.pdf" -O "ipcc_report.pdf"
Build RAG Agent on the IPCC reportยถ
[1]:
import pandas as pd
import warnings
pd.set_option("display.max_colwidth", 400)
warnings.filterwarnings('ignore')
[2]:
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.readers.file import PyMuPDFReader
from llama_index.core.base.llms.types import ChatMessage, MessageRole
loader = PyMuPDFReader()
ipcc_documents = loader.load(file_path="./ipcc_report.pdf")
[3]:
splitter = SentenceSplitter(chunk_size=512)
index = VectorStoreIndex.from_documents(ipcc_documents, transformations=[splitter])
chat_engine = index.as_chat_engine()
Letโs test the Agentยถ
[4]:
str(chat_engine.chat("How much will the global temperature rise by 2100?"))
[4]:
'The predicted global temperature rise by 2100 is 3.2 degrees Celsius, with a range of 2.2 to 3.5 degrees Celsius.'
Generate a test set on the IPCC reportยถ
[ ]:
from giskard.rag import KnowledgeBase, generate_testset, QATestset
text_nodes = splitter(ipcc_documents)
knowledge_base_df = pd.DataFrame([node.text for node in text_nodes], columns=["text"])
knowledge_base = KnowledgeBase(knowledge_base_df)
[ ]:
testset = generate_testset(knowledge_base,
num_questions=120,
agent_description="A chatbot answering questions about the IPCC report")
[7]:
# Save the testset
testset.save("ipcc_testset.jsonl")
# Load the testset
testset = QATestset.load("ipcc_testset.jsonl")
[8]:
testset.to_pandas().head(5)
[8]:
question | reference_answer | reference_context | conversation_history | metadata | |
---|---|---|---|---|---|
id | |||||
1cacb231-b6e3-44aa-a315-79aa43cff369 | When is the best estimate of reaching 1.5ยฐC of global warming according to most scenarios? | The best estimate of reaching 1.5ยฐC of global warming lies in the first half of the 2030s in most of the considered scenarios and modelled pathways. | Document 116: The best estimate of reaching 1.5ยฐC of global \nwarming lies in the ๏ฌrst half of the 2030s in most of the considered \nscenarios and modelled pathways114. In the very low GHG emissions \nscenario (SSP1-1.9), CO2 emissions reach net zero around 2050 and the \nbest-estimate end-of-century warming is 1.4ยฐC, after a temporary overshoot \n(see Section 3.3.4) of no more than 0.1ยฐC abov... | [] | {'question_type': 'simple', 'seed_document_id': 116, 'topic': 'Climate Change Mitigation Scenarios'} |
d785c257-4c44-443c-99dd-7d72a296da9f | What are the projected global emissions for 2030 based on policies implemented by the end of 2020? | The median projected global emissions for 2030 based on policies implemented by the end of 2020 are 57 GtCO2-eq/yr, with a range of 52โ60 GtCO2-eq/yr. | Document 82: Emissions projections for 2030 and gross differences in emissions are based on emissions of 52โ56 GtCO2-eq yrโ1 in 2019 as assumed in underlying model \nstudies97. (medium con๏ฌdence) {WGIII Table SPM.1} (Table 3.1, Cross-Section Box.2) \n95 \nAbatement here refers to human interventions that reduce the amount of GHGs that are released from fossil fuel infrastructure to the atmosph... | [] | {'question_type': 'simple', 'seed_document_id': 82, 'topic': 'Global Greenhouse Gas Emissions and Climate Policy'} |
0646700a-9617-4dad-9a12-f84a6048ca9d | What are some key barriers to the implementation of adaptation options in vulnerable sectors? | Key barriers include limited resources, lack of private-sector and civic engagement, insufficient mobilisation of finance, lack of political commitment, limited research and/or slow and low uptake of adaptation science, and a low sense of urgency. | Document 95: 62\nSection 2\nSection 1\nSection 2\n๏ฌre-adapted ecosystems, or hard defences against ๏ฌooding) and human \nsettlements (e.g. stranded assets and vulnerable communities that \ncannot afford to shift away or adapt and require an increase in social \nsafety nets). Maladaptation especially affects marginalised and vulnerable \ngroups adversely (e.g., Indigenous Peoples, ethnic minorit... | [] | {'question_type': 'simple', 'seed_document_id': 95, 'topic': 'Others'} |
0d78955c-f9c8-41ad-9ba4-a2670da4e63c | What are some irreversible changes projected due to continued GHG emissions? | Many changes due to past and future GHG emissions are irreversible on centennial to millennial time scales, especially in the ocean, ice sheets, and global sea level. | Document 118: {WGI SPM D.1.7, WGI Box TS.7} (Cross-Section Box.2)\nContinued GHG emissions will further affect all major climate \nsystem components, and many changes will be irreversible on \ncentennial to millennial time scales. Many changes in the climate \nsystem become larger in direct relation to increasing global warming. \nWith every additional increment of global warming, changes in \... | [] | {'question_type': 'simple', 'seed_document_id': 118, 'topic': 'Others'} |
00d34731-7f09-446d-80fb-40c0b20b547a | What are some options for scaling up mitigation and adaptation in developing regions according to the context? | Options include increased levels of public finance and publicly mobilised private finance flows from developed to developing countries, increasing the use of public guarantees to reduce risks and leverage private flows at lower cost, local capital markets development, and building greater trust in international cooperation processes. | Document 291: Accelerated support \nfrom developed countries and multilateral institutions is a critical \nenabler to enhance mitigation and adaptation action and can address \ninequities in ๏ฌnance, including its costs, terms and conditions, and \neconomic vulnerability to climate change. Scaled-up public grants for \nmitigation and adaptation funding for vulnerable regions, e.g., in Sub-\nSah... | [] | {'question_type': 'simple', 'seed_document_id': 291, 'topic': 'Climate Change Mitigation and Adaptation'} |
Evaluate and Diagnose the Agentยถ
[9]:
from giskard.rag import evaluate, RAGReport, AgentAnswer
from giskard.rag.metrics.ragas_metrics import ragas_context_recall, ragas_context_precision
[ ]:
def answer_fn(question, history=None):
if history:
answer = chat_engine.chat(
question,
chat_history=[
ChatMessage(
role=MessageRole.USER if msg["role"] == "user" else MessageRole.ASSISTANT,
content=msg["content"]
) for msg in history
]
)
else:
answer = chat_engine.chat(question, chat_history=[])
return AgentAnswer(
message=answer.response,
documents=[source.content for source in answer.sources]
)
report = evaluate(answer_fn,
testset=testset,
knowledge_base=knowledge_base,
metrics=[ragas_context_recall, ragas_context_precision])
[11]:
# Save the RAG report
report.save("ipcc_report")
# Load the RAG report
report = RAGReport.load("ipcc_report")
[12]:
display(report.to_html(embed=True))
RAGET question typesยถ
Each question type assesses a few RAG components. This makes it possible to localize weaknesses in the RAG Agent and give feedback to the developers.
Question type |
Description |
Example |
Targeted RAG components |
---|---|---|---|
Simple |
Simple questions generated from an excerpt of the knowledge base |
How much will the global temperature rise by 2100? |
|
Complex |
Questions made more complex by paraphrasing |
How much will the temperature rise in a century? |
|
Distracting |
Questions made to confuse the retrieval part of the RAG with a distracting element from the knowledge base but irrelevant to the question |
Renewable energy are cheaper but how much will the global temperature rise by 2100? |
|
Situational |
Questions including user context to evaluate the ability of the generation to produce relevant answer according to the context |
I want to take personal actions to reduce my carbon footprint and I wonder how much will the global temperature rise by 2100? |
|
Double |
Questions with two distinct parts to evaluate the capabilities of the query rewriter of the RAG |
How much will the global temperature rise by 2100 and what is the main source of Greenhouse Gases? |
|
Conversational |
Questions made as part of a conversation, first message describe the context of the question that is ask in the last message, also tests the rewriter |
|
|