Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Abstract
Aligning language models (LMs) based on human-annotated preference data is a crucial step in obtaining practical and performant LM-based systems. However, multilingual human preference data are difficult to obtain at scale, making it challenging to extend this framework to diverse languages. In this work, we evaluate a simple approach for zero-shot cross-lingual alignment, where a reward model is trained on preference data in one source language and directly applied to other target languages. On summarization and open-ended dialog generation, we show that this method is consistently successful under comprehensive evaluation settings, including human evaluation: cross-lingually aligned models are preferred by humans over unaligned models on up to >70% of evaluation instances. We moreover find that a different-language reward model sometimes yields better aligned models than a same-language reward model. We also identify best practices when there is no language-specific data for even supervised finetuning, another component in alignment.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Prior Constraints-based Reward Model Training for Aligning Large Language Models (2024)
- ALaRM: Align Language Models via Hierarchical Rewards Modeling (2024)
- SumTra: A Differentiable Pipeline for Few-Shot Cross-Lingual Summarization (2024)
- Adaptive Cross-lingual Text Classification through In-Context One-Shot Demonstrations (2024)
- Cross-lingual Transfer or Machine Translation? On Data Augmentation for Monolingual Semantic Textual Similarity (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper