--- datasets: - baidu/TARA license: mit language: - en library_name: transformers --- ICLR 2024 Offical checkpoint for [Tool-Augmented Reward Modeling (ICLR 2024 spotlight)](https://openreview.net/pdf?id=d94x0gWTUX). # Model Description Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines. It was introduced in the [ICLR 2024 paper](https://arxiv.org/pdf/2310.01045.pdf) and first released in this [repository](https://github.com/ernie-research/Tool-Augmented-Reward-Model). Themis-7b is trained with [TARA](https://huggingface.co/datasets/baidu/TARA), achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking. ## 🔥 News * **9 February, 2024:** 🎉 We release the official codebase and model weights of [`baidu/Themis-7b`](https://huggingface.co/baidu/Themis-7b). Stay tuned!🔥 * **16 January, 2024:** 🎉 Our work has been accepted to [ICLR 2024](https://iclr.cc/Conferences/2024) **spotlight**! ✨ # Citation ```text @inproceedings{tarm-2024-ernie, author = {Lei Li and Yekun Chai and Shuohuan Wang and Yu Sun and Hao Tian and Ningyu Zhang and Hua Wu}, title = {Tool-Augmented Reward Modeling}, booktitle = {The Twelfth International Conference on Learning Representations (ICLR)}, year = {2024}, url = {https://openreview.net/forum?id=d94x0gWTUX}, } ```