jieliu
/

Storm-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jieliu commited on Jun 18

Commit

7f61891

•

1 Parent(s): 8c360b9

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -17,8 +17,8 @@ datasets:
 # Storm-7B
 - **Developed by**: [Jie Liu](https://jieliu.site/) \\(^{*1,2}\\), [Zhanhui Zhou](https://scholar.google.com/citations?user=SbACfYQAAAAJ&hl=zh-CN) \\(^{*2}\\), [Jiaheng Liu](https://liujiaheng.github.io/) \\(^{2}\\), [Xingyuan Bu](https://scholar.google.com.hk/citations?user=cqYaRhUAAAAJ&hl=zh-CN) \\(^{2}\\), [Chao Yang](https://scholar.google.com/citations?user=5KRbHPMAAAAJ&hl=zh-CN) \\(^{2}\\), [Han-Sen Zhong](https://scholar.google.com.hk/citations?user=X_ZfX8sAAAAJ&hl=zh-CN) \\(^{\dag 2}\\), [Wanli Ouyang](https://wlouyang.github.io/) \\(^{1,2}\\).
 - \\(^{1}\\)MMLab, The Chinese University of Hong Kong &ensp;  \\(^{2}\\)Shanghai AI Laboratory
-- Paper: [Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level]()
-- Finetuned from model: [openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106)
 - Dataset: [berkeley-nest/Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar)
 - Reward Model: [Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B)

 # Storm-7B
 - **Developed by**: [Jie Liu](https://jieliu.site/) \\(^{*1,2}\\), [Zhanhui Zhou](https://scholar.google.com/citations?user=SbACfYQAAAAJ&hl=zh-CN) \\(^{*2}\\), [Jiaheng Liu](https://liujiaheng.github.io/) \\(^{2}\\), [Xingyuan Bu](https://scholar.google.com.hk/citations?user=cqYaRhUAAAAJ&hl=zh-CN) \\(^{2}\\), [Chao Yang](https://scholar.google.com/citations?user=5KRbHPMAAAAJ&hl=zh-CN) \\(^{2}\\), [Han-Sen Zhong](https://scholar.google.com.hk/citations?user=X_ZfX8sAAAAJ&hl=zh-CN) \\(^{\dag 2}\\), [Wanli Ouyang](https://wlouyang.github.io/) \\(^{1,2}\\).
 - \\(^{1}\\)MMLab, The Chinese University of Hong Kong &ensp;  \\(^{2}\\)Shanghai AI Laboratory
+- Paper: [Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level](https://arxiv.org/pdf/2406.11817)
+- Finetuned from the model: [openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106)
 - Dataset: [berkeley-nest/Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar)
 - Reward Model: [Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B)