Zhenru commited on
Commit
33ca91c
1 Parent(s): 7c82a23

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -0
README.md ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-classification
6
+ ---
7
+
8
+
9
+ # Qwen2.5-Math-RM-72B
10
+
11
+ ## Introduction
12
+ Qwen2.5-Math-RM-72B is specifically designed to guide the Qwen2.5-Math model throughout the training process by offering more granular feedback on the quality of reasoning and intermediate steps, ultimately facilitating more robust model improvements.
13
+
14
+
15
+ Key Highlights:
16
+
17
+ - Multilingual and Multi-Modal Support: Offers preference signals across two languages (Chinese and English) and in dual modes (Chain-of-Thought and Tool-integrated Reasoning), enhancing versatility.
18
+
19
+ - Model Training Guide:
20
+ - Training Data Enhancement: Employs a data selection process via reward model scoring combined with Rejection Sampling to incrementally enhance the quality of responses
21
+ - Reinforcement Learning Training: Integrates seamlessly into the reinforcement learning training and provide effective reward signal, further improving model performance.
22
+
23
+ - Inference Boosting:
24
+ - Best of N: By leveraging a combination of response sampling and Best-of-N strategies, we choose the response of top score judged by reward model, yielding better results with spending more inference time. For example, Qwen2.5-Math-1.5B-Instruct obtains 83.9 on MATH in RM@8 setting and even surpasses the performance of Qwen2.5-Math-7B-Instruct 83.6 with greedy decoding.
25
+ - Comparasion with majority voting (Maj@N): RM@N scores are substantially better than Maj@N scores aross almost all benchmarks and models.
26
+
27
+
28
+ ## Model Details
29
+
30
+ For more details, please refer to our [blog post](https://qwenlm.github.io/blog/qwen2-math/) and [GitHub repo](https://github.com/QwenLM/Qwen2-Math).
31
+
32
+
33
+ ## Requirements
34
+ * `transformers>=4.40.0` for Qwen2.5-Math models. The latest version is recommended.
35
+
36
+ > [!Warning]
37
+ > <div align="center">
38
+ > <b>
39
+ > 🚨 This is a must because `transformers` integrated Qwen2.5 codes since `4.37.0`.
40
+ > </b>
41
+ > </div>
42
+
43
+ For requirements on GPU memory and the respective throughput, see similar results of Qwen2.5 [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
44
+
45
+ ## Quick Start
46
+
47
+ > [!Important]
48
+ >
49
+ > **Qwen2.5-Math-RM-72B** is a reward model typically used for offering feedback on the quality of reasoning and intermediate steps, serving in Rejection Sampling, reinforcement learning training and RM@N.
50
+
51
+ ### 🤗 Hugging Face Transformers
52
+
53
+ Here we show a code snippet to show you how to use the Qwen2.5-Math-RM-72B with `transformers`:
54
+
55
+ ```python
56
+ import torch
57
+ from transformers import AutoModel, AutoTokenizer
58
+
59
+ model_name = "Qwen/Qwen2.5-Math-RM-72B"
60
+ device = "auto" # the device to load the model onto
61
+
62
+ model = AutoModel.from_pretrained(
63
+ model_name,
64
+ device_map=device,
65
+ torch_dtype=torch.bfloat16,
66
+ trust_remote_code=True,
67
+ ).eval()
68
+
69
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
70
+
71
+ chat = [
72
+ {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
73
+ {"role": "user", "content": "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?"},
74
+ {"role": "assistant", "content": "To determine how much Janet makes from selling the duck eggs at the farmers' market, we need to follow these steps:\n\n1. Calculate the total number of eggs laid by the ducks each day.\n2. Determine how many eggs Janet eats and bakes for herself each day.\n3. Find out how many eggs are left to be sold.\n4. Calculate the revenue from selling the remaining eggs at $2 per egg.\n\nLet's start with the first step:\n\n1. Janet's ducks lay 16 eggs per day.\n\nNext, we calculate how many eggs Janet eats and bakes for herself each day:\n\n2. Janet eats 3 eggs for breakfast every morning.\n3. Janet bakes 4 eggs for her friends every day.\n\nSo, the total number of eggs Janet eats and bakes for herself each day is:\n\\[ 3 + 4 = 7 \\text{ eggs} \\]\n\nNow, we find out how many eggs are left to be sold:\n\\[ 16 - 7 = 9 \\text{ eggs} \\]\n\nFinally, we calculate the revenue from selling the remaining eggs at $2 per egg:\n\\[ 9 \\times 2 = 18 \\text{ dollars} \\]\n\nTherefore, Janet makes \\(\\boxed{18}\\) dollars every day at the farmers' market."}
75
+ ] # 3.75
76
+
77
+ conversation_str = tokenizer.apply_chat_template(
78
+ chat,
79
+ tokenize=False,
80
+ add_generation_prompt=False
81
+ )
82
+
83
+ input_ids = tokenizer.encode(
84
+ conversation_str,
85
+ return_tensors="pt",
86
+ add_special_tokens=False
87
+ ).to(model.device)
88
+
89
+ outputs = model(input_ids=input_ids)
90
+ print(outputs[0])
91
+ ```
92
+
93
+ ### 🤖 ModelScope
94
+ We strongly advise users, especially those in mainland China, to use ModelScope. `snapshot_download` can help you solve issues concerning downloading checkpoints.
95
+
96
+
97
+ ## Citation
98
+
99
+ If you find our work helpful, feel free to give us a citation.
100
+
101
+ ```
102
+ @article{yang2024qwen2,
103
+ title={Qwen2 technical report},
104
+ author={Yang, An and Yang, Baosong and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Zhou, Chang and Li, Chengpeng and Li, Chengyuan and Liu, Dayiheng and Huang, Fei and others},
105
+ journal={arXiv preprint arXiv:2407.10671},
106
+ year={2024}
107
+ }
108
+ ```