Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -33,6 +33,21 @@ NuminaMath 72B CoT is the model from Stage 1 and was fine-tuned on [AI-MO/Numina
33
  - **License:** Tongyi Qianwen
34
  - **Finetuned from model:** [Qwen/Qwen2-72B](https://huggingface.co/Qwen/Qwen2-72B)
35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  ### Model Sources
37
 
38
  <!-- Provide the basic links for the model. -->
 
33
  - **License:** Tongyi Qianwen
34
  - **Finetuned from model:** [Qwen/Qwen2-72B](https://huggingface.co/Qwen/Qwen2-72B)
35
 
36
+ ## Model performance
37
+
38
+ | | | NuminaMath-72B-CoT | NuminaMath-72B-TIR | Qwen2-72B-Instruct | Llama3-70B-Instruct | Claude-3.5-Sonnet | GPT-4o-0513 |
39
+ | --- | --- | :---: | :---: | :---: | :---: | :---: | :---: |
40
+ | **GSM8k** | 0-shot | 91.4% | 91.5% | 91.1% | 93.0% | **96.4%** | 95.8% |
41
+ | Grade school math |
42
+ | **MATH** | 0-shot | 68.0% | 75.8% | 59.7% | 50.4% | 71.1% | **76.6%** |
43
+ | Math problem-solving |
44
+ | **AMC 2023** | 0-shot | 21/40 | **24/40** | 19/40 | 13/40 | 17/40 | 20/40 |
45
+ | Competition-level math | maj@64 | 24/40 | **34/40** | 21/40 | 13/40 | - | - |
46
+ | **AIME 2024** | 0-shot | 1/30 | **5/30** | 3/30 | 0/30 | 2/30 | 2/30 |
47
+ | Competition-level math | maj@64 | 3/30 | **12/30** | 4/30 | 2/30 | - | - |
48
+
49
+ *Table: Comparison of various open weight and proprietary language models on different math benchmarks. All scores except those for NuminaMath-72B-TIR are reported without tool-integrated reasoning.*
50
+
51
  ### Model Sources
52
 
53
  <!-- Provide the basic links for the model. -->