Spaces:
Runtime error
Apply for community grant: Academic project (gpu)
This is an academic research project to understand and fix hallucinations in LLM outputs.
Title: Fine-grained Hallucination Detection and Editing For Language
Authors: Abhika Mishra, Akari Asai, Vidhisha Balachandran, Yizhong Wang, Yulia Tsvetkov, Graham Neubig, Hannaneh Hajishirzi
Author affiliations: UW, CMU, AI2
Abstract:
Large language models (LLMs) are prone to generate diverse factually incorrect statements, which are widely called hallucinations. Current approaches predominantly focus on coarse-grained automatic hallucination detection or editing, overlooking nuanced error levels.
In this paper, we propose a novel task---automatic fine-grained hallucination detection---and present a comprehensive taxonomy encompassing six hierarchically defined types of hallucination. To facilitate evaluation, we introduce a new benchmark that includes fine-grained human judgments on two LLM outputs across various domains, leveraging external web documents. Our analysis reveals that ChatGPT and Llama 2-Chat exhibit hallucinations in 60% and 75% of their outputs, respectively, and a majority of these hallucinations fall into categories that have been underexplored.
As an initial step to address this, we train FAVA, a retrieval-augmented LM by carefully designing synthetic data generations and fine-tuning an expert LM to detect and correct fine-grained hallucinations.
On our benchmark, our automatic and human evaluations show that FAVA significantly outperforms ChatGPT
on fine-grained hallucination detection by up to 38%, though a large room for future improvement still exists.
FAVA's suggested edits can also improve the factuality of LLM-generated text, resulting in 5-10% FactScore improvements.
Hi @akariasai , we have assigned a gpu to this space. Note that GPU Grants are provided temporarily and might be removed after some time if the usage is very low.
To learn more about GPUs in Spaces, please check out https://huggingface.co/docs/hub/spaces-gpus
Thank you so much for your quick response!
@akariasai
One query:
The model is not detecting errors when reference is not provided.
The paper mentions you are retrieving documents but does the model - "fava-uw/fava-model" implements retreiver.