arxiv:2406.13663

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Published on Jun 19

· Submitted by

gsarti on Jun 21

Upvote

Authors:

Jirui Qi ,

Gabriele Sarti ,

Raquel Fernández ,

Abstract

Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sources, and fail to faithfully reflect LLMs' context usage throughout the generation. In this work, we present MIRAGE --Model Internals-based RAG Explanations -- a plug-and-play approach using model internals for faithful answer attribution in RAG applications. MIRAGE detects context-sensitive answer tokens and pairs them with retrieved documents contributing to their prediction via saliency methods. We evaluate our proposed approach on a multilingual extractive QA dataset, finding high agreement with human answer attribution. On open-ended QA, MIRAGE achieves citation quality and efficiency comparable to self-citation while also allowing for a finer-grained control of attribution parameters. Our qualitative evaluation highlights the faithfulness of MIRAGE's attributions and underscores the promising application of model internals for RAG answer attribution.

View arXiv page View PDF Add to collection

Community

gsarti

Paper author Paper submitter Jun 21

Our experiments were conducted using the attribute-context API of the Inseq library, available here: https://inseq.org/en/latest/main_classes/cli.html#attribute-context

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.13663 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.13663 in a dataset README.md to link it from this page.

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 1

Collections including this paper 1