Raincleared commited on
Commit
e71389a
1 Parent(s): e0434f5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -86,14 +86,14 @@ The evaluation results on the above benchmarks demonstrate the advantage of ProS
86
  | Vanilla ReLU-7B | 66.04 | 21.31 | 70.73 | 73.22 | 11.22 | 49.22 | 36.11 | 28.01 | 41.40 |
87
  | Shifted ReLU-7B | 69.59 | 20.50 | 70.09 | 73.17 | 13.87 | 48.54 | 35.20 | 27.94 | 41.33 |
88
  | Fixed \\(L_1\\)-7B | 91.46 | 18.85 | 66.01 | 55.39 | 2.27 | 32.28 | 31.40 | 26.48 | 33.24 |
89
- | **ProSparse-7B**\\(^\*\\) | 88.11 | 19.47 | 66.29 | 63.33 | 12.74 | 45.21 | 33.59 | 27.55 | 38.31 |
90
  | **ProSparse-7B** | 89.32 | 19.42 | 66.27 | 63.50 | 12.13 | 45.48 | 34.99 | 27.46 | 38.46 |
91
  | Original-13B | - | 20.19 | 72.58 | 71.55 | 22.21 | 54.69 | 37.89 | 29.33 | 44.06 |
92
  | ReluLLaMA-13B | 71.56 | 20.19 | 70.44 | 73.29 | 18.50 | 50.58 | 37.97 | 28.22 | 42.74 |
93
- | **ProSparse-13B**\\(^\*\\) | 87.97 | 29.03 | 69.75 | 67.54 | 25.40 | 54.78 | 40.20 | 28.76 | 45.07 |
94
  | **ProSparse-13B** | 88.80 | 28.42 | 69.76 | 66.91 | 26.31 | 54.35 | 39.90 | 28.67 | 44.90 |
95
 
96
- **Notes**: "Original" refers to the original Swish-activated LLaMA2 versions. ReluLLaMA-7B and ReluLLaMA-13B are available at [7B](https://huggingface.co/SparseLLM/ReluLLaMA-7B) and [13B](https://huggingface.co/SparseLLM/ReluLLaMA-13B) respectively. "ProSparse-7B\\(^\*\\)" and "ProSparse-13B\\(^\*\\)" denote the ProSparse versions without activation threshold shifting.
97
 
98
  ### Inference Acceleration Effects
99
 
@@ -113,10 +113,10 @@ The acceleration effects of LLMs with different sparsity are displayed as follow
113
  | ReluLLaMA-7B | 66.98 | 90.89 | 58.95 | 11.37 | 67.12 | 1.35 | 63.00 | 1.32 |
114
  | Vanilla ReLU-7B | 66.04 | 87.72 | 72.57 | 12.04 | 67.85 | 1.33 | 63.28 | 1.31 |
115
  | Fixed \\(L_1\\)-7B | 91.46 | 94.51 | 82.85 | 19.62 | 40.99 | 2.21 | 54.19 | 1.53 |
116
- | **ProSparse-7B**\\(^\*\\) | 88.11 | 93.46 | 75.24 | 16.30 | 46.66 | 1.94 | 55.56 | 1.49 |
117
  | **ProSparse-7B** | 89.32 | 92.34 | 78.75 | - | 45.38 | 2.00 | 55.05 | 1.51 |
118
  | ReluLLaMA-13B | 71.56 | 86.41 | 71.93 | 6.59 | 69.92 | 1.88 | 75.47 | 1.51 |
119
- | **ProSparse-13B**\\(^\*\\) | 87.97 | 91.02 | 77.93 | 8.67 | 55.29 | 2.38 | 67.50 | 1.68 |
120
  | **ProSparse-13B** | 88.80 | 91.11 | 78.28 | - | 53.78 | 2.44 | 66.73 | 1.70 |
121
 
122
  **Notes**: Fixed \\(L_1\\) suffers from severe performance degradation. ProSparse with Activation Threshold Shifting is not supported by PowerInfer. "Time" means the average wall-clock time (us) cost by each step with our sparse GPU operators, and "Speedup" is the speedup ratio to the setting without operators. The average time for step (2) and (3) without sparse GPU operators is about **90.55 and 82.92 (us) for 7B, 131.36 and 113.68 (us) for 13B** respectively under all sparsity.
 
86
  | Vanilla ReLU-7B | 66.04 | 21.31 | 70.73 | 73.22 | 11.22 | 49.22 | 36.11 | 28.01 | 41.40 |
87
  | Shifted ReLU-7B | 69.59 | 20.50 | 70.09 | 73.17 | 13.87 | 48.54 | 35.20 | 27.94 | 41.33 |
88
  | Fixed \\(L_1\\)-7B | 91.46 | 18.85 | 66.01 | 55.39 | 2.27 | 32.28 | 31.40 | 26.48 | 33.24 |
89
+ | **ProSparse-7B**\* | 88.11 | 19.47 | 66.29 | 63.33 | 12.74 | 45.21 | 33.59 | 27.55 | 38.31 |
90
  | **ProSparse-7B** | 89.32 | 19.42 | 66.27 | 63.50 | 12.13 | 45.48 | 34.99 | 27.46 | 38.46 |
91
  | Original-13B | - | 20.19 | 72.58 | 71.55 | 22.21 | 54.69 | 37.89 | 29.33 | 44.06 |
92
  | ReluLLaMA-13B | 71.56 | 20.19 | 70.44 | 73.29 | 18.50 | 50.58 | 37.97 | 28.22 | 42.74 |
93
+ | **ProSparse-13B**\* | 87.97 | 29.03 | 69.75 | 67.54 | 25.40 | 54.78 | 40.20 | 28.76 | 45.07 |
94
  | **ProSparse-13B** | 88.80 | 28.42 | 69.76 | 66.91 | 26.31 | 54.35 | 39.90 | 28.67 | 44.90 |
95
 
96
+ **Notes**: "Original" refers to the original Swish-activated LLaMA2 versions. ReluLLaMA-7B and ReluLLaMA-13B are available at [7B](https://huggingface.co/SparseLLM/ReluLLaMA-7B) and [13B](https://huggingface.co/SparseLLM/ReluLLaMA-13B) respectively. "ProSparse-7B\*" and "ProSparse-13B\*" denote the ProSparse versions without activation threshold shifting.
97
 
98
  ### Inference Acceleration Effects
99
 
 
113
  | ReluLLaMA-7B | 66.98 | 90.89 | 58.95 | 11.37 | 67.12 | 1.35 | 63.00 | 1.32 |
114
  | Vanilla ReLU-7B | 66.04 | 87.72 | 72.57 | 12.04 | 67.85 | 1.33 | 63.28 | 1.31 |
115
  | Fixed \\(L_1\\)-7B | 91.46 | 94.51 | 82.85 | 19.62 | 40.99 | 2.21 | 54.19 | 1.53 |
116
+ | **ProSparse-7B**\* | 88.11 | 93.46 | 75.24 | 16.30 | 46.66 | 1.94 | 55.56 | 1.49 |
117
  | **ProSparse-7B** | 89.32 | 92.34 | 78.75 | - | 45.38 | 2.00 | 55.05 | 1.51 |
118
  | ReluLLaMA-13B | 71.56 | 86.41 | 71.93 | 6.59 | 69.92 | 1.88 | 75.47 | 1.51 |
119
+ | **ProSparse-13B**\* | 87.97 | 91.02 | 77.93 | 8.67 | 55.29 | 2.38 | 67.50 | 1.68 |
120
  | **ProSparse-13B** | 88.80 | 91.11 | 78.28 | - | 53.78 | 2.44 | 66.73 | 1.70 |
121
 
122
  **Notes**: Fixed \\(L_1\\) suffers from severe performance degradation. ProSparse with Activation Threshold Shifting is not supported by PowerInfer. "Time" means the average wall-clock time (us) cost by each step with our sparse GPU operators, and "Speedup" is the speedup ratio to the setting without operators. The average time for step (2) and (3) without sparse GPU operators is about **90.55 and 82.92 (us) for 7B, 131.36 and 113.68 (us) for 13B** respectively under all sparsity.