AI-MO
/

NuminaMath-72B-CoT

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

add results table

#1

by benlipkin - opened Jul 20

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +15 -0

README.md CHANGED Viewed

@@ -33,6 +33,21 @@ NuminaMath 72B CoT is the model from Stage 1 and was fine-tuned on [AI-MO/Numina
 - **License:** Tongyi Qianwen
 - **Finetuned from model:** [Qwen/Qwen2-72B](https://huggingface.co/Qwen/Qwen2-72B)
 ### Model Sources
 <!-- Provide the basic links for the model. -->

 - **License:** Tongyi Qianwen
 - **Finetuned from model:** [Qwen/Qwen2-72B](https://huggingface.co/Qwen/Qwen2-72B)
+## Model performance
+| | | NuminaMath-72B-CoT | NuminaMath-72B-TIR | Qwen2-72B-Instruct | Llama3-70B-Instruct | Claude-3.5-Sonnet | GPT-4o-0513 |
+| --- | --- | :---: | :---: | :---: | :---: | :---: | :---: |
+| **GSM8k** | 0-shot | 91.4% | 91.5% | 91.1% | 93.0% | **96.4%** | 95.8% |
+| Grade school math |
+| **MATH** | 0-shot | 68.0% | 75.8% | 59.7% | 50.4% | 71.1% | **76.6%** |
+| Math problem-solving |
+| **AMC 2023** | 0-shot | 21/40 | **24/40** | 19/40 | 13/40 | 17/40 | 20/40 |
+| Competition-level math | maj@64 | 24/40 | **34/40** | 21/40 | 13/40 | - | - |
+| **AIME 2024** | 0-shot | 1/30 | **5/30** | 3/30 | 0/30 | 2/30 | 2/30 |
+| Competition-level math | maj@64 | 3/30 | **12/30** | 4/30 | 2/30 | - | - |
+*Table: Comparison of various open weight and proprietary language models on different math benchmarks. All scores except those for NuminaMath-72B-TIR are reported without tool-integrated reasoning.*
 ### Model Sources
 <!-- Provide the basic links for the model. -->