luofuli commited on
Commit
594d86b
1 Parent(s): 5941330

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -8
README.md CHANGED
@@ -58,15 +58,20 @@ For model details, please visit [DeepSeek-V2 page](https://github.com/deepseek-a
58
 
59
  DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:
60
 
61
- - ArenaHard winrate increased from 68.3% to 76.3%
62
- - AlpacaEval 2.0 LC winrate increased from 46.61% to 50.52%
63
- - MT-Bench score increased from 8.84 to 9.02
64
- - AlignBench score increased from 7.88 to 8.04
 
 
 
 
 
 
 
 
 
65
 
66
- DeepSeek-V2.5 further enhances code generation capabilities, optimizing for common programming application scenarios, and achieving the following results on benchmarks:
67
-
68
- - HumanEval: 89%
69
- - LiveCodeBench (January - September): 41%
70
 
71
  ## 2. How to run locally
72
 
 
58
 
59
  DeepSeek-V2.5 better aligns with human preferences and has been optimized in various aspects, including writing and instruction following:
60
 
61
+ | Metric | DeepSeek-V2-0628 | DeepSeek-Coder-V2-0724 | DeepSeek-V2.5 |
62
+ |------------------------|------------------|------------------------|---------------|
63
+ | AlpacaEval 2.0 | 46.6 | 44.5 | 50.5 |
64
+ | ArenaHard | 68.3 | 66.3 | 76.2 |
65
+ | AlignBench | 7.88 | 7.91 | 8.04 |
66
+ | MT-Bench | 8.85 | 8.91 | 9.02 |
67
+ | HumanEval python | 84.5 | 87.2 | 89 |
68
+ | HumanEval Multi | 73.8 | 74.8 | 73.8 |
69
+ | LiveCodeBench(01-09) | 36.6 | 39.7 | 41.8 |
70
+ | Aider | 69.9 | 72.9 | 72.2 |
71
+ | SWE-verified | N/A | 19 | 16.8 |
72
+ | DS-FIM-Eval | N/A | 73.2 | 78.3 |
73
+ | DS-Arena-Code | N/A | 49.5 | 63.1 |
74
 
 
 
 
 
75
 
76
  ## 2. How to run locally
77