--- license: llama3 --- # LLaMA-3-8B-SFR-RM-R This is the RM model for Salesforce/SFR-Iterative-DPO-LLaMA-3-8B-R. It is a Vanilla BT based Reward model. ## Model Releases - [SFT model](https://huggingface.co/Salesforce/SFR-SFT-LLaMA-3-8B-R) - [Reward model](https://huggingface.co/Salesforce/SFR-RM-LLaMA-3-8B-R) - [RLHF model](https://huggingface.co/Salesforce/SFR-Iterative-DPO-LLaMA-3-8B-R) ## Citation Please cite our techical report if you find our model is useful for your research or product. ```bibtex @misc{dong2024rlhf, title={RLHF Workflow: From Reward Modeling to Online RLHF}, author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang}, year={2024}, eprint={2405.07863}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```