---
datasets:
- baidu/TARA
license: mit
language:
- en
library_name: transformers
---


  <a href="https://iclr.cc/Conferences/2024" target="_blank">
      <img alt="ICLR 2024" src="https://img.shields.io/badge/Proceedings-ICLR2024-red" />
   </a>

Offical checkpoint for [Tool-Augmented Reward Modeling (ICLR 2024 spotlight)](https://openreview.net/pdf?id=d94x0gWTUX).


# Model Description

Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines.
It was introduced in the [ICLR 2024 paper](https://arxiv.org/pdf/2310.01045.pdf) and first released in this [repository](https://github.com/ernie-research/Tool-Augmented-Reward-Model).
Themis-7b is trained with [TARA](https://huggingface.co/datasets/baidu/TARA), achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking.

## 🔥 News
* **9 February, 2024:** 🎉 We release the official codebase and model weights of [`baidu/Themis-7b`](https://huggingface.co/baidu/Themis-7b). Stay tuned!🔥
* **16 January, 2024:** 🎉 Our work has been accepted to [ICLR 2024](https://iclr.cc/Conferences/2024) **spotlight**! ✨


# Citation
```text
@inproceedings{tarm-2024-ernie,
  author = {Lei Li and
            Yekun Chai and
            Shuohuan Wang and
            Yu Sun and
            Hao Tian and
            Ningyu Zhang and
            Hua Wu},
  title = {Tool-Augmented Reward Modeling},
  booktitle = {The Twelfth International Conference on Learning Representations (ICLR)},
  year = {2024},
  url = {https://openreview.net/forum?id=d94x0gWTUX},
}
```