Knowledge BaseΒΆ

class giskard.rag.knowledge_base.KnowledgeBase(data: DataFrame, columns: Sequence[str] | None = None, seed: int | None = None, llm_client: LLMClient | None = None, embedding_model: BaseEmbedding | None = None, min_topic_size: int | None = None, chunk_size: int = 2048)[source]ΒΆ

A class to handle the knowledge base and the associated vector store.

Parameters:
  • knowledge_base_df (pd.DataFrame) – A dataframe containing the whole knowledge base.

  • columns (Sequence[str], optional) – The list of columns from the knowledge_base to consider. If not specified, all columns of the knowledge base dataframe will be concatenated to produce a single document. Example: if your knowledge base consists in FAQ data with columns β€œQ” and β€œA”, we will format each row into a single document β€œQ: [question]nA: [answer]” to generate questions.

  • seed (int, optional) – The seed to use for random number generation.

  • llm_client (LLMClient, optional:) – The LLM client to use for question generation. If not specified, a default openai client will be used.

  • embedding_model (BaseEmbedding, optional) – The giskard embedding model to use for the knowledge base. By default we use giskard default model which is OpenAI β€œtext-embedding-ada-002”.

  • min_topic_size (int, optional) – The minimum number of document to form a topic inside the knowledge base.

  • chunk_size (int = 2048) – The number of document to embed in a single batch.

classmethod from_pandas(df: DataFrame, columns: Sequence[str] | None = None, **kwargs) KnowledgeBase[source]ΒΆ

Create a KnowledgeBase from a pandas DataFrame.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing the knowledge base.

  • columns (Sequence[str], optional) – The list of columns from the knowledge_base to consider. If not specified, all columns of the knowledge base dataframe will be concatenated to produce a single document. Example: if your knowledge base consists in FAQ data with columns β€œQ” and β€œA”, we will format each row into a single document β€œQ: [question]nA: [answer]” to generate questions.

  • kwargs – Additional settings for knowledge base (see __init__).

class giskard.rag.knowledge_base.Document(document: Dict[str, str], doc_id: str | None = None, features: Sequence | None = None, topic_id: int | None = None)[source]ΒΆ

A class to wrap the elements of the knowledge base into a unified format.