|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Llama-2-7b-hf |
|
datasets: |
|
- tatsu-lab/alpaca_farm |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
This is the backbone SFT model used in the paper "[DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging](https://arxiv.org/abs/2407.01470)". |
|
|
|
The detailed training/evaluation information can be found at https://api.wandb.ai/links/merge_exp/2qs92v6f. |
|
|
|
For the detailed information about this model, please refer to our paper. |
|
|
|
If you found this model useful, please cite our paper: |
|
``` |
|
@article{lin2024dogerm, |
|
title={DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging}, |
|
author={Lin, Tzu-Han and Li, Chen-An and Lee, Hung-yi and Chen, Yun-Nung}, |
|
journal={arXiv preprint arXiv:2407.01470}, |
|
year={2024} |
|
} |
|
``` |