BiGGen-Bench-Leaderboard / data /llm-perf-leaderboard-1xA100.csv
scottsuk0306's picture
Update
0c155df
Experiment πŸ§ͺ,Model πŸ€—,Prefill (s),Per Token (s),Decode (tokens/s),Energy (tokens/kWh),Memory (MB),Backend 🏭,Precision πŸ“₯,Quantization πŸ—œοΈ,Attention πŸ‘οΈ,Kernel βš›οΈ,Architecture πŸ›οΈ,End-to-End (s),Open LLM Score (%),Params (B)
4bit-gptq-exllama-v2-eager,Qwen/Qwen1.5-110B,2.486,2.3411865234375,0.398,2661.495,65311.037,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Qwen2ForCausalLM,149.921,75.42,110
4bit-gptq-exllama-v1-fa2,Qwen/Qwen1.5-110B,2.513,2.368027587890625,0.421,2633.025,65311.036,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Qwen2ForCausalLM,151.799,75.42,110
4bit-gptq-exllama-v1-eager,Qwen/Qwen1.5-110B,2.515,2.3592529296875,0.424,2662.679,65311.037,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Qwen2ForCausalLM,151.175,75.42,110
4bit-gptq-exllama-v2-sdpa,Qwen/Qwen1.5-110B,2.499,2.349084716796875,0.425,2666.191,65311.036,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV2,Qwen2ForCausalLM,150.5,75.42,110
4bit-gptq-exllama-v1-sdpa,Qwen/Qwen1.5-110B,2.48,2.33512646484375,0.428,2664.976,65311.036,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV1,Qwen2ForCausalLM,149.561,75.42,110
4bit-bnb-fa2,Qwen/Qwen1.5-110B,4.467,0.2968944702148438,3.363,23268.535,65013.93,pytorch,float16,BnB.4bit,FAv2,No Kernel,Qwen2ForCausalLM,23.206,75.42,110
4bit-bnb-eager,Qwen/Qwen1.5-110B,4.446,0.2606571655273437,3.835,25487.07,65014.062,pytorch,float16,BnB.4bit,Eager,No Kernel,Qwen2ForCausalLM,20.873,75.42,110
4bit-bnb-sdpa,Qwen/Qwen1.5-110B,4.436,0.2596505737304687,3.848,26017.077,65013.93,pytorch,float16,BnB.4bit,SDPA,No Kernel,Qwen2ForCausalLM,20.818,75.42,110
4bit-gptq-exllama-v1-fa2,Qwen/Qwen1.5-72B,1.659,1.5589693603515624,0.64,3969.019,45374.151,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Qwen2ForCausalLM,100.136,72.91,72
4bit-gptq-exllama-v2-fa2,Qwen/Qwen1.5-72B,1.64,1.544901611328125,0.647,4022.375,45374.151,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,Qwen2ForCausalLM,98.929,72.91,72
4bit-gptq-exllama-v2-sdpa,Qwen/Qwen1.5-72B,1.645,1.5385220947265623,0.65,4074.884,45374.151,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV2,Qwen2ForCausalLM,98.609,72.91,72
4bit-gptq-exllama-v1-sdpa,Qwen/Qwen1.5-72B,1.64,1.5355289306640625,0.651,4083.113,45374.151,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV1,Qwen2ForCausalLM,98.385,72.91,72
4bit-gptq-exllama-v1-eager,Qwen/Qwen1.5-72B,1.641,1.5341466064453124,0.652,4086.555,45374.152,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Qwen2ForCausalLM,98.294,72.91,72
4bit-gptq-exllama-v2-eager,Qwen/Qwen1.5-72B,1.641,1.5337093505859376,0.652,4073.93,45374.152,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Qwen2ForCausalLM,98.238,72.91,72
8bit-bnb-eager,Qwen/Qwen1.5-72B,0.266,0.2631034851074219,3.788,30333.894,77840.722,pytorch,float16,BnB.8bit,Eager,No Kernel,Qwen2ForCausalLM,16.875,72.91,72
8bit-bnb-fa2,Qwen/Qwen1.5-72B,0.263,0.2624276428222656,3.795,30225.887,77841.345,pytorch,float16,BnB.8bit,FAv2,No Kernel,Qwen2ForCausalLM,16.843,72.91,72
8bit-bnb-sdpa,Qwen/Qwen1.5-72B,0.259,0.2576486511230468,3.847,31114.81,77841.345,pytorch,float16,BnB.8bit,SDPA,No Kernel,Qwen2ForCausalLM,16.578,72.91,72
4bit-bnb-fa2,Qwen/Qwen1.5-72B,2.914,0.209786880493164,4.759,34328.499,44278.471,pytorch,float16,BnB.4bit,FAv2,No Kernel,Qwen2ForCausalLM,16.154,72.91,72
4bit-bnb-eager,Qwen/Qwen1.5-72B,2.907,0.1775267791748047,5.625,38909.536,44278.602,pytorch,float16,BnB.4bit,Eager,No Kernel,Qwen2ForCausalLM,14.101,72.91,72
4bit-bnb-sdpa,Qwen/Qwen1.5-72B,2.884,0.1753354187011718,5.697,39707.214,44278.471,pytorch,float16,BnB.4bit,SDPA,No Kernel,Qwen2ForCausalLM,13.939,72.91,72
4bit-gptq-exllama-v1-fa2,Qwen/Qwen1.5-32B,0.759,0.7142164306640625,1.4,8742.68,21326.311,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Unknown,45.777,70.47,32
4bit-gptq-exllama-v2-fa2,Qwen/Qwen1.5-32B,0.758,0.7138324584960938,1.4,8749.008,21326.311,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,Unknown,45.745,70.47,32
4bit-gptq-exllama-v1-eager,Qwen/Qwen1.5-32B,0.749,0.697112548828125,1.435,8958.627,21326.312,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Unknown,44.666,70.47,32
4bit-gptq-exllama-v2-eager,Qwen/Qwen1.5-32B,0.748,0.69574755859375,1.437,8980.23,21326.312,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Unknown,44.571,70.47,32
4bit-gptq-exllama-v1-sdpa,Qwen/Qwen1.5-32B,0.744,0.6952366333007812,1.438,8995.519,21326.311,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV1,Unknown,44.558,70.47,32
4bit-gptq-exllama-v2-sdpa,Qwen/Qwen1.5-32B,0.742,0.6944102172851563,1.44,9019.264,21326.311,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV2,Unknown,44.505,70.47,32
8bit-bnb-eager,Qwen/Qwen1.5-32B,0.215,0.214687744140625,4.64,37808.933,35661.209,pytorch,float16,BnB.8bit,Eager,No Kernel,Unknown,13.77,70.47,32
8bit-bnb-fa2,Qwen/Qwen1.5-32B,0.212,0.212853759765625,4.679,38609.39,35661.209,pytorch,float16,BnB.8bit,FAv2,No Kernel,Unknown,13.657,70.47,32
8bit-bnb-sdpa,Qwen/Qwen1.5-32B,0.207,0.2074357757568359,4.803,39412.893,35661.209,pytorch,float16,BnB.8bit,SDPA,No Kernel,Unknown,13.313,70.47,32
4bit-bnb-fa2,Qwen/Qwen1.5-32B,1.231,0.1302108154296875,7.65,57873.387,21184.84,pytorch,float16,BnB.4bit,FAv2,No Kernel,Unknown,9.455,70.47,32
4bit-bnb-eager,Qwen/Qwen1.5-32B,1.221,0.1254154205322265,7.962,58872.682,21184.971,pytorch,float16,BnB.4bit,Eager,No Kernel,Unknown,9.122,70.47,32
4bit-bnb-sdpa,Qwen/Qwen1.5-32B,1.216,0.1207705612182617,8.237,61166.327,21184.84,pytorch,float16,BnB.4bit,SDPA,No Kernel,Unknown,8.858,70.47,32
bfloat16-sdpa,Qwen/Qwen1.5-32B,0.113,0.0539330558776855,18.422,114101.162,66512.805,pytorch,bfloat16,Unquantized,SDPA,No Kernel,Unknown,3.525,70.47,32
4bit-gptq-exllama-v1-fa2,Qwen/Qwen1.5-32B,0.759,0.7142164306640625,1.4,8742.68,21326.311,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Qwen2ForCausalLM,45.777,70.39,32
4bit-gptq-exllama-v2-fa2,Qwen/Qwen1.5-32B,0.758,0.7138324584960938,1.4,8749.008,21326.311,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,Qwen2ForCausalLM,45.745,70.39,32
4bit-gptq-exllama-v1-eager,Qwen/Qwen1.5-32B,0.749,0.697112548828125,1.435,8958.627,21326.312,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Qwen2ForCausalLM,44.666,70.39,32
4bit-gptq-exllama-v2-eager,Qwen/Qwen1.5-32B,0.748,0.69574755859375,1.437,8980.23,21326.312,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Qwen2ForCausalLM,44.571,70.39,32
4bit-gptq-exllama-v1-sdpa,Qwen/Qwen1.5-32B,0.744,0.6952366333007812,1.438,8995.519,21326.311,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV1,Qwen2ForCausalLM,44.558,70.39,32
4bit-gptq-exllama-v2-sdpa,Qwen/Qwen1.5-32B,0.742,0.6944102172851563,1.44,9019.264,21326.311,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV2,Qwen2ForCausalLM,44.505,70.39,32
8bit-bnb-eager,Qwen/Qwen1.5-32B,0.215,0.214687744140625,4.64,37808.933,35661.209,pytorch,float16,BnB.8bit,Eager,No Kernel,Qwen2ForCausalLM,13.77,70.39,32
8bit-bnb-fa2,Qwen/Qwen1.5-32B,0.212,0.212853759765625,4.679,38609.39,35661.209,pytorch,float16,BnB.8bit,FAv2,No Kernel,Qwen2ForCausalLM,13.657,70.39,32
8bit-bnb-sdpa,Qwen/Qwen1.5-32B,0.207,0.2074357757568359,4.803,39412.893,35661.209,pytorch,float16,BnB.8bit,SDPA,No Kernel,Qwen2ForCausalLM,13.313,70.39,32
4bit-bnb-fa2,Qwen/Qwen1.5-32B,1.231,0.1302108154296875,7.65,57873.387,21184.84,pytorch,float16,BnB.4bit,FAv2,No Kernel,Qwen2ForCausalLM,9.455,70.39,32
4bit-bnb-eager,Qwen/Qwen1.5-32B,1.221,0.1254154205322265,7.962,58872.682,21184.971,pytorch,float16,BnB.4bit,Eager,No Kernel,Qwen2ForCausalLM,9.122,70.39,32
4bit-bnb-sdpa,Qwen/Qwen1.5-32B,1.216,0.1207705612182617,8.237,61166.327,21184.84,pytorch,float16,BnB.4bit,SDPA,No Kernel,Qwen2ForCausalLM,8.858,70.39,32
bfloat16-sdpa,Qwen/Qwen1.5-32B,0.113,0.0539330558776855,18.422,114101.162,66512.805,pytorch,bfloat16,Unquantized,SDPA,No Kernel,Qwen2ForCausalLM,3.525,70.39,32
float32-eager,internlm/internlm2-20b,0.673,0.0580843505859375,17.204,107334.978,81737.513,pytorch,float32,Unquantized,Eager,No Kernel,Unknown,4.333,69.75,20
bfloat16-eager,internlm/internlm2-20b,0.085,0.0520796165466308,18.595,131877.402,40915.737,pytorch,bfloat16,Unquantized,Eager,No Kernel,Unknown,3.48,69.75,20
float16-eager,internlm/internlm2-20b,0.083,0.0429731826782226,22.885,147138.442,40915.713,pytorch,float16,Unquantized,Eager,No Kernel,Unknown,2.843,69.75,20
float16-fa2,internlm/internlm2-20b,0.08,0.0387727355957031,25.159,141982.091,40909.85,pytorch,float16,Unquantized,FAv2,No Kernel,Unknown,2.554,69.75,20
bfloat16-fa2,internlm/internlm2-20b,0.08,0.0382894096374511,25.999,164064.681,40909.85,pytorch,bfloat16,Unquantized,FAv2,No Kernel,Unknown,2.496,69.75,20
4bit-gptq-exllama-v1-fa2,01-ai/Yi-34B,0.806,0.7479408569335938,1.337,8398.815,20339.706,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,LlamaForCausalLM,47.925,69.42,34
4bit-gptq-exllama-v1-eager,01-ai/Yi-34B,0.802,0.7413575439453125,1.349,8446.395,20339.707,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,LlamaForCausalLM,47.518,69.42,34
4bit-gptq-exllama-v2-fa2,01-ai/Yi-34B,0.797,0.7391539306640625,1.353,8502.111,20339.706,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,LlamaForCausalLM,47.378,69.42,34
4bit-gptq-exllama-v1-sdpa,01-ai/Yi-34B,0.793,0.7362805786132812,1.358,8507.675,20339.706,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV1,LlamaForCausalLM,47.181,69.42,34
4bit-gptq-exllama-v2-sdpa,01-ai/Yi-34B,0.791,0.7357869873046875,1.359,8497.601,20339.706,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV2,LlamaForCausalLM,47.141,69.42,34
4bit-gptq-exllama-v2-eager,01-ai/Yi-34B,0.795,0.7347783813476563,1.361,8496.697,20339.707,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,LlamaForCausalLM,47.088,69.42,34
8bit-bnb-fa2,01-ai/Yi-34B,0.222,0.2120785980224609,4.63,40260.489,35777.361,pytorch,float16,BnB.8bit,FAv2,No Kernel,LlamaForCausalLM,13.934,69.42,34
8bit-bnb-sdpa,01-ai/Yi-34B,0.207,0.2068643798828125,4.797,39300.981,35784.527,pytorch,float16,BnB.8bit,SDPA,No Kernel,LlamaForCausalLM,13.35,69.42,34
8bit-bnb-eager,01-ai/Yi-34B,0.208,0.2074746856689453,4.803,39386.631,35784.558,pytorch,float16,BnB.8bit,Eager,No Kernel,LlamaForCausalLM,13.307,69.42,34
4bit-bnb-eager,01-ai/Yi-34B,1.265,0.1208412170410156,8.253,60268.352,20257.332,pytorch,float16,BnB.4bit,Eager,No Kernel,LlamaForCausalLM,8.88,69.42,34
4bit-bnb-fa2,01-ai/Yi-34B,1.263,0.1186662368774414,8.336,61121.736,20257.201,pytorch,float16,BnB.4bit,FAv2,No Kernel,LlamaForCausalLM,8.823,69.42,34
4bit-bnb-sdpa,01-ai/Yi-34B,1.259,0.1156628494262695,8.549,61089.017,20257.201,pytorch,float16,BnB.4bit,SDPA,No Kernel,LlamaForCausalLM,8.539,69.42,34
bfloat16-eager,01-ai/Yi-34B,0.139,0.0625111045837402,15.957,98579.367,69113.77,pytorch,bfloat16,Unquantized,Eager,No Kernel,LlamaForCausalLM,4.082,69.42,34
float16-eager,01-ai/Yi-34B,0.139,0.0617041931152343,16.108,100686.062,69113.741,pytorch,float16,Unquantized,Eager,No Kernel,LlamaForCausalLM,4.031,69.42,34
float16-sdpa,01-ai/Yi-34B,0.135,0.0574320640563964,17.306,108446.829,69113.726,pytorch,float16,Unquantized,SDPA,No Kernel,LlamaForCausalLM,3.771,69.42,34
bfloat16-sdpa,01-ai/Yi-34B,0.131,0.0570808334350585,17.41,109196.143,69113.726,pytorch,bfloat16,Unquantized,SDPA,No Kernel,LlamaForCausalLM,3.737,69.42,34
bfloat16-fa2,01-ai/Yi-34B,0.129,0.0554455032348632,17.872,110149.156,69106.595,pytorch,bfloat16,Unquantized,FAv2,No Kernel,LlamaForCausalLM,3.642,69.42,34
float16-fa2,01-ai/Yi-34B,0.131,0.0554065933227539,17.94,111925.191,69106.595,pytorch,float16,Unquantized,FAv2,No Kernel,LlamaForCausalLM,3.637,69.42,34
4bit-gptq-exllama-v2-fa2,Qwen/Qwen1.5-14B,0.322,0.302266357421875,3.307,20680.95,11417.443,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,Qwen2ForCausalLM,19.376,66.7,14
4bit-gptq-exllama-v1-fa2,Qwen/Qwen1.5-14B,0.319,0.298787841796875,3.343,20946.25,11417.443,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Qwen2ForCausalLM,19.169,66.7,14
4bit-gptq-exllama-v2-eager,Qwen/Qwen1.5-14B,0.318,0.2930442199707031,3.411,21276.966,11417.444,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Qwen2ForCausalLM,18.787,66.7,14
4bit-gptq-exllama-v1-eager,Qwen/Qwen1.5-14B,0.317,0.29292236328125,3.412,21318.34,11417.444,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Qwen2ForCausalLM,18.781,66.7,14
4bit-gptq-exllama-v2-sdpa,Qwen/Qwen1.5-14B,0.315,0.292917236328125,3.413,21325.519,11417.443,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV2,Qwen2ForCausalLM,18.772,66.7,14
4bit-gptq-exllama-v1-sdpa,Qwen/Qwen1.5-14B,0.316,0.2926960754394531,3.415,21284.671,11417.443,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV1,Qwen2ForCausalLM,18.757,66.7,14
8bit-bnb-eager,Qwen/Qwen2-beta-14B,0.134,0.1311662139892578,7.571,63129.015,17162.983,pytorch,float16,BnB.8bit,Eager,No Kernel,Unknown,8.407,66.7,14
8bit-bnb-fa2,Qwen/Qwen2-beta-14B,0.132,0.1310904388427734,7.616,63784.45,17162.139,pytorch,float16,BnB.8bit,FAv2,No Kernel,Unknown,8.392,66.7,14
8bit-bnb-eager,Qwen/Qwen1.5-14B,0.133,0.1306890258789062,7.632,63327.692,17162.983,pytorch,float16,BnB.8bit,Eager,No Kernel,Qwen2ForCausalLM,8.375,66.7,14
8bit-bnb-fa2,Qwen/Qwen1.5-14B,0.131,0.13016064453125,7.672,62918.239,17162.139,pytorch,float16,BnB.8bit,FAv2,No Kernel,Qwen2ForCausalLM,8.334,66.7,14
8bit-bnb-sdpa,Qwen/Qwen2-beta-14B,0.128,0.1272565765380859,7.839,63963.619,17162.139,pytorch,float16,BnB.8bit,SDPA,No Kernel,Unknown,8.167,66.7,14
8bit-bnb-sdpa,Qwen/Qwen1.5-14B,0.127,0.1261967391967773,7.885,65452.264,17162.139,pytorch,float16,BnB.8bit,SDPA,No Kernel,Qwen2ForCausalLM,8.115,66.7,14
4bit-bnb-fa2,Qwen/Qwen2-beta-14B,0.511,0.0763965454101562,12.979,102068.536,11094.619,pytorch,float16,BnB.4bit,FAv2,No Kernel,Unknown,5.359,66.7,14
4bit-bnb-eager,Qwen/Qwen1.5-14B,0.502,0.0766648330688476,12.995,101028.853,11093.767,pytorch,float16,BnB.4bit,Eager,No Kernel,Qwen2ForCausalLM,5.355,66.7,14
4bit-bnb-fa2,Qwen/Qwen1.5-14B,0.511,0.0764078063964843,13.067,100869.91,11094.619,pytorch,float16,BnB.4bit,FAv2,No Kernel,Qwen2ForCausalLM,5.319,66.7,14
4bit-bnb-eager,Qwen/Qwen2-beta-14B,0.504,0.0760985565185546,13.091,101519.305,11093.767,pytorch,float16,BnB.4bit,Eager,No Kernel,Unknown,5.303,66.7,14
4bit-bnb-sdpa,Qwen/Qwen1.5-14B,0.501,0.0729886703491211,13.606,106419.525,11094.619,pytorch,float16,BnB.4bit,SDPA,No Kernel,Qwen2ForCausalLM,5.134,66.7,14
4bit-bnb-sdpa,Qwen/Qwen2-beta-14B,0.5,0.0723947525024414,13.785,105456.899,11094.619,pytorch,float16,BnB.4bit,SDPA,No Kernel,Unknown,5.064,66.7,14
float32-sdpa,Qwen/Qwen2-beta-14B,0.472,0.0421468162536621,23.705,148566.494,59131.042,pytorch,float32,Unquantized,SDPA,No Kernel,Unknown,3.128,66.7,14
float32-eager,Qwen/Qwen2-beta-14B,0.473,0.0414248962402343,24.118,150604.059,59131.042,pytorch,float32,Unquantized,Eager,No Kernel,Unknown,3.083,66.7,14
float16-fa2,Qwen/Qwen2-beta-14B,0.057,0.03829248046875,25.962,167208.938,29628.641,pytorch,float16,Unquantized,FAv2,No Kernel,Unknown,2.477,66.7,14
bfloat16-fa2,Qwen/Qwen2-beta-14B,0.056,0.0382750701904296,26.024,171028.502,29628.641,pytorch,bfloat16,Unquantized,FAv2,No Kernel,Unknown,2.474,66.7,14
bfloat16-eager,Qwen/Qwen2-beta-14B,0.053,0.033952766418457,29.407,186517.422,29627.777,pytorch,bfloat16,Unquantized,Eager,No Kernel,Unknown,2.193,66.7,14
float16-eager,Qwen/Qwen2-beta-14B,0.054,0.0333055992126464,29.785,187842.717,29628.641,pytorch,float16,Unquantized,Eager,No Kernel,Unknown,2.161,66.7,14
bfloat16-sdpa,Qwen/Qwen2-beta-14B,0.05,0.0308305912017822,32.262,199491.508,29628.641,pytorch,bfloat16,Unquantized,SDPA,No Kernel,Unknown,1.996,66.7,14
float16-sdpa,Qwen/Qwen2-beta-14B,0.052,0.0307169284820556,32.425,202153.056,29628.641,pytorch,float16,Unquantized,SDPA,No Kernel,Unknown,1.988,66.7,14
4bit-gptq-exllama-v2-fa2,Qwen/Qwen1.5-7B,0.178,0.1636229095458984,6.104,38185.725,7110.584,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,Qwen2ForCausalLM,10.495,61.76,7
4bit-gptq-exllama-v1-fa2,Qwen/Qwen1.5-7B,0.176,0.1622589416503906,6.156,38429.543,7110.584,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Qwen2ForCausalLM,10.402,61.76,7
4bit-gptq-exllama-v2-sdpa,Qwen/Qwen1.5-7B,0.169,0.1540648956298828,6.488,40418.403,7110.584,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV2,Qwen2ForCausalLM,9.882,61.76,7
4bit-gptq-exllama-v2-eager,Qwen/Qwen1.5-7B,0.169,0.1534781494140625,6.515,40703.915,7110.585,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Qwen2ForCausalLM,9.84,61.76,7
4bit-gptq-exllama-v1-eager,Qwen/Qwen1.5-7B,0.17,0.153154556274414,6.523,40746.829,7110.585,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Qwen2ForCausalLM,9.829,61.76,7
4bit-gptq-exllama-v1-sdpa,Qwen/Qwen1.5-7B,0.168,0.1531156463623046,6.53,40810.499,7110.584,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV1,Qwen2ForCausalLM,9.817,61.76,7
8bit-bnb-fa2,Qwen/Qwen1.5-7B,0.106,0.1043712005615234,9.529,81397.173,10046.34,pytorch,float16,BnB.8bit,FAv2,No Kernel,Qwen2ForCausalLM,6.703,61.76,7
8bit-bnb-eager,Qwen/Qwen1.5-7B,0.105,0.1032509460449218,9.667,81417.862,10046.34,pytorch,float16,BnB.8bit,Eager,No Kernel,Qwen2ForCausalLM,6.618,61.76,7
8bit-bnb-sdpa,Qwen/Qwen1.5-7B,0.102,0.1015767059326171,9.782,83354.464,10046.34,pytorch,float16,BnB.8bit,SDPA,No Kernel,Qwen2ForCausalLM,6.528,61.76,7
4bit-bnb-fa2,Qwen/Qwen1.5-7B,0.291,0.0622233581542968,16.001,129775.678,6859.561,pytorch,float16,BnB.4bit,FAv2,No Kernel,Qwen2ForCausalLM,4.223,61.76,7
4bit-bnb-eager,Qwen/Qwen1.5-7B,0.283,0.0611809272766113,16.29,133802.283,6859.693,pytorch,float16,BnB.4bit,Eager,No Kernel,Qwen2ForCausalLM,4.153,61.76,7
4bit-bnb-sdpa,Qwen/Qwen1.5-7B,0.281,0.0579102706909179,17.241,140087.9,6859.561,pytorch,float16,BnB.4bit,SDPA,No Kernel,Qwen2ForCausalLM,3.923,61.76,7
bfloat16-eager,Qwen/Qwen1.5-7B,0.033,0.027060224533081,36.685,256436.742,16416.242,pytorch,bfloat16,Unquantized,Eager,No Kernel,Qwen2ForCausalLM,1.749,61.76,7
float16-eager,Qwen/Qwen1.5-7B,0.035,0.0266495990753173,37.386,254931.518,16416.242,pytorch,float16,Unquantized,Eager,No Kernel,Qwen2ForCausalLM,1.717,61.76,7
float32-sdpa,Qwen/Qwen1.5-7B,0.269,0.0251545600891113,39.712,248273.588,32662.329,pytorch,float32,Unquantized,SDPA,No Kernel,Qwen2ForCausalLM,1.856,61.76,7
bfloat16-sdpa,Qwen/Qwen1.5-7B,0.031,0.024498176574707,40.266,273241.258,16416.242,pytorch,bfloat16,Unquantized,SDPA,No Kernel,Qwen2ForCausalLM,1.582,61.76,7
float16-sdpa,Qwen/Qwen1.5-7B,0.033,0.0246415367126464,40.4,271307.112,16416.242,pytorch,float16,Unquantized,SDPA,No Kernel,Qwen2ForCausalLM,1.59,61.76,7
4bit-gptq-exllama-v1-fa2,Deci/DeciLM-7B,0.177,0.1591111755371093,6.242,39234.618,4542.986,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,DeciLMForCausalLM,10.212,61.55,7
4bit-gptq-exllama-v1-eager,Deci/DeciLM-7B,0.176,0.1585008697509765,6.28,39403.846,4542.986,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,DeciLMForCausalLM,10.168,61.55,7
4bit-gptq-exllama-v2-eager,Deci/DeciLM-7B,0.176,0.1583861694335937,6.301,39448.689,4542.986,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,DeciLMForCausalLM,10.16,61.55,7
4bit-gptq-exllama-v2-fa2,Deci/DeciLM-7B,0.177,0.1581578216552734,6.323,39550.391,4542.986,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,DeciLMForCausalLM,10.141,61.55,7
8bit-bnb-eager,Deci/DeciLM-7B,0.112,0.1085685729980468,9.145,77902.18,7514.465,pytorch,float16,BnB.8bit,Eager,No Kernel,DeciLMForCausalLM,6.994,61.55,7
8bit-bnb-fa2,Deci/DeciLM-7B,0.111,0.1073848342895507,9.272,79804.194,7514.465,pytorch,float16,BnB.8bit,FAv2,No Kernel,DeciLMForCausalLM,6.905,61.55,7
4bit-bnb-eager,Deci/DeciLM-7B,0.285,0.0612147216796875,16.165,128174.485,4557.528,pytorch,float16,BnB.4bit,Eager,No Kernel,DeciLMForCausalLM,4.181,61.55,7
4bit-bnb-fa2,Deci/DeciLM-7B,0.283,0.0611655693054199,16.223,129606.147,4557.528,pytorch,float16,BnB.4bit,FAv2,No Kernel,DeciLMForCausalLM,4.156,61.55,7
float16-fa2,Deci/DeciLM-7B,0.035,0.0276981754302978,35.979,246453.59,14290.687,pytorch,float16,Unquantized,FAv2,No Kernel,DeciLMForCausalLM,1.781,61.55,7
bfloat16-fa2,Deci/DeciLM-7B,0.036,0.0275711994171142,36.273,254203.902,14290.687,pytorch,bfloat16,Unquantized,FAv2,No Kernel,DeciLMForCausalLM,1.772,61.55,7
bfloat16-eager,Deci/DeciLM-7B,0.036,0.0274544639587402,36.278,252890.35,14290.687,pytorch,bfloat16,Unquantized,Eager,No Kernel,DeciLMForCausalLM,1.769,61.55,7
float16-eager,Deci/DeciLM-7B,0.035,0.0274513912200927,36.347,254214.391,14290.687,pytorch,float16,Unquantized,Eager,No Kernel,DeciLMForCausalLM,1.765,61.55,7
float32-eager,Deci/DeciLM-7B,0.26,0.0250234870910644,39.901,248383.54,28529.571,pytorch,float32,Unquantized,Eager,No Kernel,DeciLMForCausalLM,1.838,61.55,7
8bit-bnb-eager,TencentARC/Mistral_Pro_8B_v0.1,0.133,0.1332080688476562,7.477,63386.861,10056.919,pytorch,float16,BnB.8bit,Eager,No Kernel,MistralForCausalLM,8.529,61.06,8
8bit-bnb-fa2,TencentARC/Mistral_Pro_8B_v0.1,0.133,0.1326766052246093,7.52,63638.018,10056.901,pytorch,float16,BnB.8bit,FAv2,No Kernel,MistralForCausalLM,8.504,61.06,8
8bit-bnb-sdpa,TencentARC/Mistral_Pro_8B_v0.1,0.129,0.1305528259277343,7.583,63655.197,10056.901,pytorch,float16,BnB.8bit,SDPA,No Kernel,MistralForCausalLM,8.447,61.06,8
4bit-bnb-fa2,TencentARC/Mistral_Pro_8B_v0.1,0.378,0.0774328308105468,12.921,105176.391,6130.076,pytorch,float16,BnB.4bit,FAv2,No Kernel,MistralForCausalLM,5.245,61.06,8
4bit-bnb-eager,TencentARC/Mistral_Pro_8B_v0.1,0.368,0.0762798080444336,13.049,105807.92,6130.207,pytorch,float16,BnB.4bit,Eager,No Kernel,MistralForCausalLM,5.189,61.06,8
4bit-bnb-sdpa,TencentARC/Mistral_Pro_8B_v0.1,0.364,0.0726702117919921,13.705,108685.174,6130.076,pytorch,float16,BnB.4bit,SDPA,No Kernel,MistralForCausalLM,4.942,61.06,8
bfloat16-fa2,TencentARC/Mistral_Pro_8B_v0.1,0.064,0.0494766082763671,18.898,192652.736,18774.938,pytorch,bfloat16,Unquantized,FAv2,No Kernel,MistralForCausalLM,3.182,61.06,8
float16-fa2,TencentARC/Mistral_Pro_8B_v0.1,0.052,0.0361553916931152,27.636,160155.483,18774.938,pytorch,float16,Unquantized,FAv2,No Kernel,MistralForCausalLM,2.33,61.06,8
float16-eager,TencentARC/Mistral_Pro_8B_v0.1,0.045,0.0354969596862793,28.055,199077.816,18774.948,pytorch,float16,Unquantized,Eager,No Kernel,MistralForCausalLM,2.282,61.06,8
bfloat16-eager,TencentARC/Mistral_Pro_8B_v0.1,0.047,0.03498291015625,28.49,151470.549,18774.964,pytorch,bfloat16,Unquantized,Eager,No Kernel,MistralForCausalLM,2.25,61.06,8
bfloat16-sdpa,TencentARC/Mistral_Pro_8B_v0.1,0.044,0.0321003532409668,28.582,214674.752,18774.938,pytorch,bfloat16,Unquantized,SDPA,No Kernel,MistralForCausalLM,2.086,61.06,8
float16-sdpa,TencentARC/Mistral_Pro_8B_v0.1,0.042,0.0322201614379882,30.637,214042.402,18774.938,pytorch,float16,Unquantized,SDPA,No Kernel,MistralForCausalLM,2.078,61.06,8
float32-eager,TencentARC/Mistral_Pro_8B_v0.1,0.335,0.0316262397766113,31.508,197831.613,37534.53,pytorch,float32,Unquantized,Eager,No Kernel,MistralForCausalLM,2.333,61.06,8
float32-sdpa,TencentARC/Mistral_Pro_8B_v0.1,0.33,0.0313978881835937,31.77,198934.234,37534.494,pytorch,float32,Unquantized,SDPA,No Kernel,MistralForCausalLM,2.31,61.06,8
float32-eager,internlm/internlm-20b,0.705,0.0793333740234375,12.56,78971.296,82203.53,pytorch,float32,Unquantized,Eager,No Kernel,InternLMForCausalLM,5.712,59.55,20
float16-eager,internlm/internlm-20b,0.081,0.0755814437866211,13.025,99846.325,41420.788,pytorch,float16,Unquantized,Eager,No Kernel,InternLMForCausalLM,4.957,59.55,20
bfloat16-fa2,internlm/internlm-20b,0.075,0.0668876800537109,15.024,103362.431,41420.787,pytorch,bfloat16,Unquantized,FAv2,No Kernel,InternLMForCausalLM,4.286,59.55,20
bfloat16-eager,internlm/internlm-20b,0.081,0.0650844192504882,15.33,99111.479,41442.261,pytorch,bfloat16,Unquantized,Eager,No Kernel,InternLMForCausalLM,4.184,59.55,20
float16-fa2,internlm/internlm-20b,0.078,0.0617318382263183,16.104,103591.136,41420.787,pytorch,float16,Unquantized,FAv2,No Kernel,InternLMForCausalLM,3.98,59.55,20
8bit-bnb-eager,Qwen/Qwen1.5-4B,0.139,0.1358602294921875,7.319,64939.622,5789.886,pytorch,float16,BnB.8bit,Eager,No Kernel,Qwen2ForCausalLM,8.747,57.05,3
8bit-bnb-fa2,Qwen/Qwen1.5-4B,0.13,0.1299681243896484,7.674,67641.747,5789.886,pytorch,float16,BnB.8bit,FAv2,No Kernel,Qwen2ForCausalLM,8.333,57.05,3
8bit-bnb-sdpa,Qwen/Qwen1.5-4B,0.128,0.1278269424438476,7.793,68389.147,5789.886,pytorch,float16,BnB.8bit,SDPA,No Kernel,Qwen2ForCausalLM,8.209,57.05,3
4bit-gptq-exllama-v2-fa2,Qwen/Qwen1.5-4B,0.112,0.1010135040283203,9.871,62441.263,4389.693,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,Qwen2ForCausalLM,6.483,57.05,3
4bit-gptq-exllama-v1-fa2,Qwen/Qwen1.5-4B,0.111,0.1008404464721679,9.899,63499.758,4389.693,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Qwen2ForCausalLM,6.467,57.05,3
4bit-gptq-exllama-v2-eager,Qwen/Qwen1.5-4B,0.099,0.0861163482666015,11.597,72553.687,4389.694,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Qwen2ForCausalLM,5.528,57.05,3
4bit-gptq-exllama-v1-eager,Qwen/Qwen1.5-4B,0.098,0.0855541763305664,11.664,73434.956,4389.694,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Qwen2ForCausalLM,5.492,57.05,3
4bit-gptq-exllama-v2-sdpa,Qwen/Qwen1.5-4B,0.097,0.0848998413085937,11.755,73710.179,4389.693,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV2,Qwen2ForCausalLM,5.451,57.05,3
4bit-gptq-exllama-v1-sdpa,Qwen/Qwen1.5-4B,0.097,0.0848783340454101,11.758,73660.092,4389.693,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV1,Qwen2ForCausalLM,5.45,57.05,3
4bit-bnb-fa2,Qwen/Qwen1.5-4B,0.157,0.0778475494384765,12.779,108045.366,4291.035,pytorch,float16,BnB.4bit,FAv2,No Kernel,Qwen2ForCausalLM,5.097,57.05,3
4bit-bnb-eager,Qwen/Qwen1.5-4B,0.142,0.075971580505371,13.076,114089.801,4291.293,pytorch,float16,BnB.4bit,Eager,No Kernel,Qwen2ForCausalLM,4.95,57.05,3
4bit-bnb-sdpa,Qwen/Qwen1.5-4B,0.141,0.0751001586914062,13.151,114564.921,4291.035,pytorch,float16,BnB.4bit,SDPA,No Kernel,Qwen2ForCausalLM,4.913,57.05,3
8bit-bnb-eager,Qwen/Qwen1.5-MoE-A2.7B,0.741,0.6937733154296875,1.399,12842.708,15921.993,pytorch,float16,BnB.8bit,Eager,No Kernel,Unknown,44.485,56.03,14
8bit-bnb-fa2,Qwen/Qwen1.5-MoE-A2.7B,0.683,0.6790184936523438,1.471,12733.116,15921.207,pytorch,float16,BnB.8bit,FAv2,No Kernel,Unknown,43.541,56.03,14
8bit-bnb-sdpa,Qwen/Qwen1.5-MoE-A2.7B,0.668,0.6668635864257813,1.493,13064.707,15921.207,pytorch,float16,BnB.8bit,SDPA,No Kernel,Unknown,42.798,56.03,14
4bit-bnb-eager,Qwen/Qwen1.5-MoE-A2.7B,0.665,0.6048717041015625,1.625,14351.577,8963.124,pytorch,float16,BnB.4bit,Eager,No Kernel,Unknown,39.454,56.03,14
4bit-bnb-fa2,Qwen/Qwen1.5-MoE-A2.7B,0.634,0.5769584350585938,1.728,14927.261,8963.124,pytorch,float16,BnB.4bit,FAv2,No Kernel,Unknown,37.038,56.03,14
4bit-bnb-sdpa,Qwen/Qwen1.5-MoE-A2.7B,0.618,0.57547265625,1.732,15221.615,8963.124,pytorch,float16,BnB.4bit,SDPA,No Kernel,Unknown,36.985,56.03,14
float16-fa2,Qwen/Qwen1.5-MoE-A2.7B,0.319,0.2504622039794922,3.939,33652.697,29029.726,pytorch,float16,Unquantized,FAv2,No Kernel,Unknown,16.154,56.03,14
bfloat16-fa2,Qwen/Qwen1.5-MoE-A2.7B,0.318,0.2515455932617187,3.953,33955.497,29029.726,pytorch,bfloat16,Unquantized,FAv2,No Kernel,Unknown,16.261,56.03,14
float16-sdpa,Qwen/Qwen1.5-MoE-A2.7B,0.319,0.2492241973876953,3.969,33500.653,29029.726,pytorch,float16,Unquantized,SDPA,No Kernel,Unknown,16.187,56.03,14
bfloat16-sdpa,Qwen/Qwen1.5-MoE-A2.7B,0.317,0.2492610626220703,3.987,34518.439,29029.726,pytorch,bfloat16,Unquantized,SDPA,No Kernel,Unknown,16.058,56.03,14
4bit-gptq-exllama-v1-eager,01-ai/Yi-6B,0.148,0.1335818176269531,7.48,46668.673,4383.673,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,LlamaForCausalLM,8.571,54.08,6
4bit-gptq-exllama-v2-eager,01-ai/Yi-6B,0.148,0.1333135375976562,7.5,47031.732,4383.673,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,LlamaForCausalLM,8.549,54.08,6
4bit-gptq-exllama-v2-sdpa,01-ai/Yi-6B,0.147,0.1325373382568359,7.54,47165.737,4383.672,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV2,LlamaForCausalLM,8.502,54.08,6
4bit-gptq-exllama-v1-sdpa,01-ai/Yi-6B,0.145,0.1321912384033203,7.56,47302.247,4383.672,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV1,LlamaForCausalLM,8.479,54.08,6
4bit-gptq-exllama-v1-fa2,01-ai/Yi-6B,0.146,0.1322506256103515,7.565,47209.478,4383.672,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,LlamaForCausalLM,8.474,54.08,6
4bit-gptq-exllama-v2-fa2,01-ai/Yi-6B,0.145,0.1320038452148437,7.576,47258.355,4383.672,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,LlamaForCausalLM,8.46,54.08,6
8bit-bnb-fa2,01-ai/Yi-6B,0.122,0.1202708511352539,8.294,71936.758,6883.612,pytorch,float16,BnB.8bit,FAv2,No Kernel,LlamaForCausalLM,7.695,54.08,6
8bit-bnb-sdpa,01-ai/Yi-6B,0.121,0.1159710693359375,8.476,72438.094,6883.612,pytorch,float16,BnB.8bit,SDPA,No Kernel,LlamaForCausalLM,7.517,54.08,6
8bit-bnb-eager,01-ai/Yi-6B,0.117,0.1180047378540039,8.531,74501.828,6883.612,pytorch,float16,BnB.8bit,Eager,No Kernel,LlamaForCausalLM,7.498,54.08,6
4bit-bnb-eager,01-ai/Yi-6B,0.239,0.0685793304443359,14.205,117514.54,4344.191,pytorch,float16,BnB.4bit,Eager,No Kernel,LlamaForCausalLM,4.593,54.08,6
4bit-bnb-fa2,01-ai/Yi-6B,0.237,0.0677509155273437,14.478,116945.05,4344.06,pytorch,float16,BnB.4bit,FAv2,No Kernel,LlamaForCausalLM,4.521,54.08,6
4bit-bnb-sdpa,01-ai/Yi-6B,0.237,0.0656271362304687,14.777,119850.176,4344.06,pytorch,float16,BnB.4bit,SDPA,No Kernel,LlamaForCausalLM,4.468,54.08,6
float16-eager,01-ai/Yi-6B,0.037,0.0342794227600097,29.199,225420.285,12315.695,pytorch,float16,Unquantized,Eager,No Kernel,LlamaForCausalLM,2.196,54.08,6
bfloat16-eager,01-ai/Yi-6B,0.035,0.0330751991271972,30.201,222509.193,12315.695,pytorch,bfloat16,Unquantized,Eager,No Kernel,LlamaForCausalLM,2.113,54.08,6
float16-sdpa,01-ai/Yi-6B,0.032,0.0304189434051513,32.575,238157.461,12315.695,pytorch,float16,Unquantized,SDPA,No Kernel,LlamaForCausalLM,1.964,54.08,6
float32-eager,01-ai/Yi-6B,0.216,0.0301455364227294,33.033,218482.721,24528.503,pytorch,float32,Unquantized,Eager,No Kernel,LlamaForCausalLM,2.115,54.08,6
bfloat16-sdpa,01-ai/Yi-6B,0.031,0.0300677127838134,33.055,241680.777,12315.695,pytorch,bfloat16,Unquantized,SDPA,No Kernel,LlamaForCausalLM,1.923,54.08,6
float16-fa2,01-ai/Yi-6B,0.03,0.0293744640350341,33.598,242398.98,12315.695,pytorch,float16,Unquantized,FAv2,No Kernel,LlamaForCausalLM,1.91,54.08,6
bfloat16-fa2,01-ai/Yi-6B,0.03,0.0291368961334228,34.172,253843.644,12315.695,pytorch,bfloat16,Unquantized,FAv2,No Kernel,LlamaForCausalLM,1.867,54.08,6
float32-sdpa,01-ai/Yi-6B,0.213,0.0278220806121826,35.817,231139.557,24528.467,pytorch,float32,Unquantized,SDPA,No Kernel,LlamaForCausalLM,1.968,54.08,6
4bit-gptq-exllama-v1-eager,01-ai/Yi-6B,0.148,0.1335818176269531,7.48,46668.673,4383.673,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,LlamaForCausalLM,8.571,54.02,6
4bit-gptq-exllama-v2-eager,01-ai/Yi-6B,0.148,0.1333135375976562,7.5,47031.732,4383.673,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,LlamaForCausalLM,8.549,54.02,6
4bit-gptq-exllama-v2-sdpa,01-ai/Yi-6B,0.147,0.1325373382568359,7.54,47165.737,4383.672,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV2,LlamaForCausalLM,8.502,54.02,6
4bit-gptq-exllama-v1-sdpa,01-ai/Yi-6B,0.145,0.1321912384033203,7.56,47302.247,4383.672,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV1,LlamaForCausalLM,8.479,54.02,6
4bit-gptq-exllama-v1-fa2,01-ai/Yi-6B,0.146,0.1322506256103515,7.565,47209.478,4383.672,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,LlamaForCausalLM,8.474,54.02,6
4bit-gptq-exllama-v2-fa2,01-ai/Yi-6B,0.145,0.1320038452148437,7.576,47258.355,4383.672,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,LlamaForCausalLM,8.46,54.02,6
8bit-bnb-fa2,01-ai/Yi-6B,0.122,0.1202708511352539,8.294,71936.758,6883.612,pytorch,float16,BnB.8bit,FAv2,No Kernel,LlamaForCausalLM,7.695,54.02,6
8bit-bnb-sdpa,01-ai/Yi-6B,0.121,0.1159710693359375,8.476,72438.094,6883.612,pytorch,float16,BnB.8bit,SDPA,No Kernel,LlamaForCausalLM,7.517,54.02,6
8bit-bnb-eager,01-ai/Yi-6B,0.117,0.1180047378540039,8.531,74501.828,6883.612,pytorch,float16,BnB.8bit,Eager,No Kernel,LlamaForCausalLM,7.498,54.02,6
4bit-bnb-eager,01-ai/Yi-6B,0.239,0.0685793304443359,14.205,117514.54,4344.191,pytorch,float16,BnB.4bit,Eager,No Kernel,LlamaForCausalLM,4.593,54.02,6
4bit-bnb-fa2,01-ai/Yi-6B,0.237,0.0677509155273437,14.478,116945.05,4344.06,pytorch,float16,BnB.4bit,FAv2,No Kernel,LlamaForCausalLM,4.521,54.02,6
4bit-bnb-sdpa,01-ai/Yi-6B,0.237,0.0656271362304687,14.777,119850.176,4344.06,pytorch,float16,BnB.4bit,SDPA,No Kernel,LlamaForCausalLM,4.468,54.02,6
float16-eager,01-ai/Yi-6B,0.037,0.0342794227600097,29.199,225420.285,12315.695,pytorch,float16,Unquantized,Eager,No Kernel,LlamaForCausalLM,2.196,54.02,6
bfloat16-eager,01-ai/Yi-6B,0.035,0.0330751991271972,30.201,222509.193,12315.695,pytorch,bfloat16,Unquantized,Eager,No Kernel,LlamaForCausalLM,2.113,54.02,6
float16-sdpa,01-ai/Yi-6B,0.032,0.0304189434051513,32.575,238157.461,12315.695,pytorch,float16,Unquantized,SDPA,No Kernel,LlamaForCausalLM,1.964,54.02,6
float32-eager,01-ai/Yi-6B,0.216,0.0301455364227294,33.033,218482.721,24528.503,pytorch,float32,Unquantized,Eager,No Kernel,LlamaForCausalLM,2.115,54.02,6
bfloat16-sdpa,01-ai/Yi-6B,0.031,0.0300677127838134,33.055,241680.777,12315.695,pytorch,bfloat16,Unquantized,SDPA,No Kernel,LlamaForCausalLM,1.923,54.02,6
float16-fa2,01-ai/Yi-6B,0.03,0.0293744640350341,33.598,242398.98,12315.695,pytorch,float16,Unquantized,FAv2,No Kernel,LlamaForCausalLM,1.91,54.02,6
bfloat16-fa2,01-ai/Yi-6B,0.03,0.0291368961334228,34.172,253843.644,12315.695,pytorch,bfloat16,Unquantized,FAv2,No Kernel,LlamaForCausalLM,1.867,54.02,6
float32-sdpa,01-ai/Yi-6B,0.213,0.0278220806121826,35.817,231139.557,24528.467,pytorch,float32,Unquantized,SDPA,No Kernel,LlamaForCausalLM,1.968,54.02,6
float32-eager,microsoft/phi-1_5,0.059,0.0195164165496826,49.575,345539.322,5949.832,pytorch,float32,Unquantized,Eager,No Kernel,PhiForCausalLM,1.332,47.69,1
bfloat16-eager,microsoft/phi-1_5,0.02,0.0190894088745117,52.267,378590.349,3023.634,pytorch,bfloat16,Unquantized,Eager,No Kernel,PhiForCausalLM,1.224,47.69,1
float16-eager,microsoft/phi-1_5,0.02,0.0188282871246337,52.677,402076.824,3023.634,pytorch,float16,Unquantized,Eager,No Kernel,PhiForCausalLM,1.213,47.69,1
float16-fa2,microsoft/phi-1_5,0.018,0.0177858562469482,55.95,438770.278,3022.613,pytorch,float16,Unquantized,FAv2,No Kernel,PhiForCausalLM,1.142,47.69,1
bfloat16-fa2,microsoft/phi-1_5,0.018,0.0170956802368164,57.77,450439.226,3022.613,pytorch,bfloat16,Unquantized,FAv2,No Kernel,PhiForCausalLM,1.103,47.69,1
float16-sdpa,microsoft/phi-1_5,0.018,0.0170618877410888,58.596,470189.603,3022.613,pytorch,float16,Unquantized,SDPA,No Kernel,PhiForCausalLM,1.092,47.69,1
bfloat16-sdpa,microsoft/phi-1_5,0.017,0.0164577274322509,59.988,455983.243,3022.613,pytorch,bfloat16,Unquantized,SDPA,No Kernel,PhiForCausalLM,1.062,47.69,1
float32-sdpa,microsoft/phi-1_5,0.056,0.0159713277816772,62.342,450982.812,5949.832,pytorch,float32,Unquantized,SDPA,No Kernel,PhiForCausalLM,1.059,47.69,1
8bit-bnb-eager,Qwen/Qwen1.5-1.8B,0.081,0.0797747192382812,12.476,111433.433,3158.448,pytorch,float16,BnB.8bit,Eager,No Kernel,Qwen2ForCausalLM,5.13,46.55,1
8bit-bnb-fa2,Qwen/Qwen1.5-1.8B,0.08,0.0789882888793945,12.631,113806.724,3158.448,pytorch,float16,BnB.8bit,FAv2,No Kernel,Qwen2ForCausalLM,5.073,46.55,1
8bit-bnb-sdpa,Qwen/Qwen1.5-1.8B,0.078,0.0767518692016601,12.941,114968.89,3158.448,pytorch,float16,BnB.8bit,SDPA,No Kernel,Qwen2ForCausalLM,4.943,46.55,1
4bit-gptq-exllama-v1-eager,Qwen/Qwen1.5-1.8B,0.05,0.0484075508117675,20.584,143671.29,2628.77,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Qwen2ForCausalLM,3.103,46.55,1
4bit-gptq-exllama-v1-fa2,Qwen/Qwen1.5-1.8B,0.051,0.048449535369873,20.594,144025.889,2628.769,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Qwen2ForCausalLM,3.104,46.55,1
4bit-gptq-exllama-v2-fa2,Qwen/Qwen1.5-1.8B,0.051,0.0480430068969726,20.76,145713.086,2628.769,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,Qwen2ForCausalLM,3.074,46.55,1
4bit-gptq-exllama-v2-eager,Qwen/Qwen1.5-1.8B,0.049,0.0479569931030273,20.776,146534.596,2628.77,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Qwen2ForCausalLM,3.074,46.55,1
4bit-bnb-eager,Qwen/Qwen1.5-1.8B,0.064,0.0466513938903808,21.322,184823.08,2585.787,pytorch,float16,BnB.4bit,Eager,No Kernel,Qwen2ForCausalLM,3.01,46.55,1
4bit-gptq-exllama-v2-sdpa,Qwen/Qwen1.5-1.8B,0.049,0.0467271690368652,21.353,145872.178,2628.769,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV2,Qwen2ForCausalLM,2.992,46.55,1
4bit-bnb-fa2,Qwen/Qwen1.5-1.8B,0.073,0.0465735664367675,21.391,187922.909,2585.787,pytorch,float16,BnB.4bit,FAv2,No Kernel,Qwen2ForCausalLM,3.008,46.55,1
4bit-gptq-exllama-v1-sdpa,Qwen/Qwen1.5-1.8B,0.048,0.0465264625549316,21.416,147962.283,2628.769,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV1,Qwen2ForCausalLM,2.984,46.55,1
4bit-bnb-sdpa,Qwen/Qwen1.5-1.8B,0.062,0.0441620483398437,22.452,196584.268,2585.787,pytorch,float16,BnB.4bit,SDPA,No Kernel,Qwen2ForCausalLM,2.862,46.55,1
bfloat16-fa2,Qwen/Qwen1.5-1.8B,0.022,0.0207626247406005,47.657,376911.509,4408.408,pytorch,bfloat16,Unquantized,FAv2,No Kernel,Qwen2ForCausalLM,1.335,46.55,1
float16-fa2,Qwen/Qwen1.5-1.8B,0.021,0.0204472312927246,48.812,383317.645,4408.408,pytorch,float16,Unquantized,FAv2,No Kernel,Qwen2ForCausalLM,1.31,46.55,1
bfloat16-eager,Qwen/Qwen1.5-1.8B,0.022,0.0204482555389404,48.834,384089.46,4408.408,pytorch,bfloat16,Unquantized,Eager,No Kernel,Qwen2ForCausalLM,1.312,46.55,1
float16-eager,Qwen/Qwen1.5-1.8B,0.022,0.020242431640625,49.096,380746.463,4408.408,pytorch,float16,Unquantized,Eager,No Kernel,Qwen2ForCausalLM,1.299,46.55,1
float32-eager,Qwen/Qwen1.5-1.8B,0.06,0.0188979206085205,52.458,402115.515,8597.293,pytorch,float32,Unquantized,Eager,No Kernel,Qwen2ForCausalLM,1.254,46.55,1
float16-sdpa,Qwen/Qwen1.5-1.8B,0.02,0.0187688961029052,52.789,414137.925,4408.408,pytorch,float16,Unquantized,SDPA,No Kernel,Qwen2ForCausalLM,1.209,46.55,1
bfloat16-sdpa,Qwen/Qwen1.5-1.8B,0.019,0.0185108470916748,53.864,426916.746,4408.408,pytorch,bfloat16,Unquantized,SDPA,No Kernel,Qwen2ForCausalLM,1.188,46.55,1
float32-sdpa,Qwen/Qwen1.5-1.8B,0.058,0.01723801612854,57.76,432794.785,8597.293,pytorch,float32,Unquantized,SDPA,No Kernel,Qwen2ForCausalLM,1.147,46.55,1
4bit-bnb-eager,facebook/opt-66b,2.647,0.1709598693847656,5.837,40640.181,37434.811,pytorch,float16,BnB.4bit,Eager,No Kernel,OPTForCausalLM,13.443,42.78,66
8bit-bnb-eager,facebook/opt-66b,0.173,0.1674977264404296,5.937,43032.007,68003.561,pytorch,float16,BnB.8bit,Eager,No Kernel,OPTForCausalLM,10.795,42.78,66
4bit-bnb-fa2,facebook/opt-66b,2.647,0.1575536651611328,6.33,43499.097,37434.68,pytorch,float16,BnB.4bit,FAv2,No Kernel,OPTForCausalLM,12.585,42.78,66
8bit-bnb-eager,Salesforce/codegen-16B-nl,0.086,0.085394432067871,11.686,94229.855,17381.431,pytorch,float16,BnB.8bit,Eager,No Kernel,CodeGenForCausalLM,5.467,42.59,16
float32-eager,Salesforce/codegen-16B-nl,0.576,0.0487710723876953,20.452,128257.079,65363.832,pytorch,float32,Unquantized,Eager,No Kernel,CodeGenForCausalLM,3.649,42.59,16
bfloat16-eager,Salesforce/codegen-16B-nl,0.064,0.0363489265441894,27.48,175303.81,32792.184,pytorch,bfloat16,Unquantized,Eager,No Kernel,CodeGenForCausalLM,2.36,42.59,16
float16-eager,Salesforce/codegen-16B-nl,0.063,0.0355614738464355,28.047,178300.329,32792.184,pytorch,float16,Unquantized,Eager,No Kernel,CodeGenForCausalLM,2.306,42.59,16
8bit-bnb-eager,facebook/opt-30b,0.127,0.1246095352172851,7.971,64435.961,31446.286,pytorch,float16,BnB.8bit,Eager,No Kernel,OPTForCausalLM,8.002,41.99,30
8bit-bnb-fa2,facebook/opt-30b,0.117,0.1175275497436523,8.492,67604.764,31450.479,pytorch,float16,BnB.8bit,FAv2,No Kernel,OPTForCausalLM,7.521,41.99,30
4bit-bnb-eager,facebook/opt-30b,1.321,0.0912087020874023,10.938,78225.591,17680.925,pytorch,float16,BnB.4bit,Eager,No Kernel,OPTForCausalLM,7.075,41.99,30
4bit-bnb-fa2,facebook/opt-30b,1.307,0.0774225921630859,12.891,90273.361,17680.794,pytorch,float16,BnB.4bit,FAv2,No Kernel,OPTForCausalLM,6.191,41.99,30
float16-eager,facebook/opt-30b,0.12,0.048503807067871,20.586,128680.81,60836.515,pytorch,float16,Unquantized,Eager,No Kernel,OPTForCausalLM,3.179,41.99,30
bfloat16-eager,facebook/opt-30b,0.11,0.04767232131958,20.935,130625.548,60836.515,pytorch,bfloat16,Unquantized,Eager,No Kernel,OPTForCausalLM,3.116,41.99,30
float16-fa2,facebook/opt-30b,0.114,0.0442951698303222,22.529,140361.802,60837.496,pytorch,float16,Unquantized,FAv2,No Kernel,OPTForCausalLM,2.909,41.99,30
bfloat16-fa2,facebook/opt-30b,0.103,0.0439941101074218,22.715,141619.806,60837.496,pytorch,bfloat16,Unquantized,FAv2,No Kernel,OPTForCausalLM,2.876,41.99,30
4bit-gptq-exllama-v2-eager,EleutherAI/gpt-neox-20b,0.479,0.4443187255859375,2.25,14044.206,13715.589,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,GPTNeoXForCausalLM,28.483,41.69,20
4bit-gptq-exllama-v1-eager,EleutherAI/gpt-neox-20b,0.478,0.4436869201660156,2.253,14084.383,13715.589,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,GPTNeoXForCausalLM,28.438,41.69,20
4bit-gptq-exllama-v1-fa2,EleutherAI/gpt-neox-20b,0.469,0.437781494140625,2.284,14277.95,13715.588,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,GPTNeoXForCausalLM,28.05,41.69,20
4bit-gptq-exllama-v2-fa2,EleutherAI/gpt-neox-20b,0.468,0.4377108459472656,2.285,14296.128,13715.588,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,GPTNeoXForCausalLM,28.048,41.69,20
8bit-bnb-eager,EleutherAI/gpt-neox-20b,0.122,0.1186959381103515,8.298,69928.382,22536.283,pytorch,float16,BnB.8bit,Eager,No Kernel,GPTNeoXForCausalLM,7.672,41.69,20
8bit-bnb-fa2,EleutherAI/gpt-neox-20b,0.101,0.1027123184204101,9.496,78557.68,22540.222,pytorch,float16,BnB.8bit,FAv2,No Kernel,GPTNeoXForCausalLM,6.733,41.69,20
4bit-bnb-eager,EleutherAI/gpt-neox-20b,0.82,0.0790169601440429,12.42,94346.195,13411.544,pytorch,float16,BnB.4bit,Eager,No Kernel,GPTNeoXForCausalLM,5.881,41.69,20
4bit-bnb-fa2,EleutherAI/gpt-neox-20b,0.805,0.0695562210083007,14.07,103611.797,13411.544,pytorch,float16,BnB.4bit,FAv2,No Kernel,GPTNeoXForCausalLM,5.292,41.69,20
float32-eager,EleutherAI/gpt-neox-20b,0.755,0.0624803848266601,15.975,100223.669,84145.78,pytorch,float32,Unquantized,Eager,No Kernel,GPTNeoXForCausalLM,4.692,41.69,20
float16-eager,EleutherAI/gpt-neox-20b,0.086,0.0409323501586914,24.356,153164.052,42460.724,pytorch,float16,Unquantized,Eager,No Kernel,GPTNeoXForCausalLM,2.667,41.69,20
bfloat16-eager,EleutherAI/gpt-neox-20b,0.085,0.0407807998657226,24.449,153893.023,42460.724,pytorch,bfloat16,Unquantized,Eager,No Kernel,GPTNeoXForCausalLM,2.658,41.69,20
float16-fa2,EleutherAI/gpt-neox-20b,0.079,0.0363407363891601,27.409,171626.76,42461.861,pytorch,float16,Unquantized,FAv2,No Kernel,GPTNeoXForCausalLM,2.374,41.69,20
bfloat16-fa2,EleutherAI/gpt-neox-20b,0.076,0.0360693778991699,27.643,173550.304,42461.861,pytorch,bfloat16,Unquantized,FAv2,No Kernel,GPTNeoXForCausalLM,2.352,41.69,20
4bit-gptq-exllama-v2-fa2,EleutherAI/gpt-j-6b,0.154,0.1440143432617187,6.939,43376.583,4531.242,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,GPTJForCausalLM,9.23,40.1,6
4bit-gptq-exllama-v1-fa2,EleutherAI/gpt-j-6b,0.155,0.1437634582519531,6.946,43391.488,4531.684,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,GPTJForCausalLM,9.224,40.1,6
4bit-gptq-exllama-v1-eager,EleutherAI/gpt-j-6b,0.149,0.1367132110595703,7.305,45621.944,4531.243,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,GPTJForCausalLM,8.768,40.1,6
4bit-gptq-exllama-v2-eager,EleutherAI/gpt-j-6b,0.149,0.1364398040771484,7.32,45648.538,4531.243,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,GPTJForCausalLM,8.75,40.1,6
8bit-bnb-fa2,EleutherAI/gpt-j-6b,0.099,0.0991825942993164,9.947,85196.754,6915.153,pytorch,float16,BnB.8bit,FAv2,No Kernel,GPTJForCausalLM,6.422,40.1,6
8bit-bnb-eager,EleutherAI/gpt-j-6b,0.099,0.09643212890625,10.218,97333.755,6910.556,pytorch,float16,BnB.8bit,Eager,No Kernel,GPTJForCausalLM,6.33,40.1,6
4bit-bnb-fa2,EleutherAI/gpt-j-6b,0.247,0.0656670684814453,15.046,122819.256,4430.536,pytorch,float16,BnB.4bit,FAv2,No Kernel,GPTJForCausalLM,4.42,40.1,6
4bit-bnb-eager,EleutherAI/gpt-j-6b,0.243,0.0626647033691406,15.756,132340.74,4430.536,pytorch,float16,BnB.4bit,Eager,No Kernel,GPTJForCausalLM,4.18,40.1,6
bfloat16-fa2,EleutherAI/gpt-j-6b,0.039,0.0384942092895507,26.187,198075.458,12548.118,pytorch,bfloat16,Unquantized,FAv2,No Kernel,GPTJForCausalLM,2.455,40.1,6
float16-fa2,EleutherAI/gpt-j-6b,0.038,0.0371865615844726,26.834,198370.271,12548.118,pytorch,float16,Unquantized,FAv2,No Kernel,GPTJForCausalLM,2.377,40.1,6
float16-eager,EleutherAI/gpt-j-6b,0.033,0.0316887035369873,31.36,244059.2,12543.514,pytorch,float16,Unquantized,Eager,No Kernel,GPTJForCausalLM,2.041,40.1,6
bfloat16-eager,EleutherAI/gpt-j-6b,0.033,0.0299622402191162,33.289,253350.114,12543.514,pytorch,bfloat16,Unquantized,Eager,No Kernel,GPTJForCausalLM,1.925,40.1,6
float32-eager,EleutherAI/gpt-j-6b,0.218,0.028654592514038,34.538,221958.982,24932.502,pytorch,float32,Unquantized,Eager,No Kernel,GPTJForCausalLM,2.042,40.1,6
8bit-bnb-eager,facebook/opt-13b,0.107,0.1061560287475586,8.819,71610.738,13822.812,pytorch,float16,BnB.8bit,Eager,No Kernel,OPTForCausalLM,6.823,40.06,13
8bit-bnb-fa2,facebook/opt-13b,0.099,0.0993361892700195,10.049,82083.427,13833.288,pytorch,float16,BnB.8bit,FAv2,No Kernel,OPTForCausalLM,6.355,40.06,13
4bit-bnb-eager,facebook/opt-13b,0.507,0.0630169601440429,15.494,124413.509,7922.799,pytorch,float16,BnB.4bit,Eager,No Kernel,OPTForCausalLM,4.493,40.06,13
4bit-bnb-fa2,facebook/opt-13b,0.498,0.0582451210021972,16.996,132053.433,7922.668,pytorch,float16,BnB.4bit,FAv2,No Kernel,OPTForCausalLM,4.197,40.06,13
float32-eager,facebook/opt-13b,0.449,0.0397056007385253,25.116,157631.732,52468.032,pytorch,float32,Unquantized,Eager,No Kernel,OPTForCausalLM,2.953,40.06,13
float16-eager,facebook/opt-13b,0.048,0.026040319442749,38.242,238581.578,26239.663,pytorch,float16,Unquantized,Eager,No Kernel,OPTForCausalLM,1.691,40.06,13
bfloat16-eager,facebook/opt-13b,0.047,0.0253767681121826,39.2,244893.072,26239.663,pytorch,bfloat16,Unquantized,Eager,No Kernel,OPTForCausalLM,1.647,40.06,13
float16-fa2,facebook/opt-13b,0.043,0.0213923835754394,46.507,288914.942,26238.909,pytorch,float16,Unquantized,FAv2,No Kernel,OPTForCausalLM,1.395,40.06,13
bfloat16-fa2,facebook/opt-13b,0.042,0.0213329925537109,46.743,290583.447,26238.909,pytorch,bfloat16,Unquantized,FAv2,No Kernel,OPTForCausalLM,1.389,40.06,13
4bit-bnb-eager,Salesforce/codegen-6B-nl,0.288,0.056576000213623,17.621,140262.324,5007.212,pytorch,float16,BnB.4bit,Eager,No Kernel,CodeGenForCausalLM,3.862,40.0,6
float16-eager,Salesforce/codegen-6B-nl,0.038,0.0347740173339843,28.681,209899.419,14645.241,pytorch,float16,Unquantized,Eager,No Kernel,CodeGenForCausalLM,2.244,40.0,6
bfloat16-eager,Salesforce/codegen-6B-nl,0.037,0.0343040008544921,29.055,214394.519,14645.241,pytorch,bfloat16,Unquantized,Eager,No Kernel,CodeGenForCausalLM,2.203,40.0,6
float32-eager,Salesforce/codegen-6B-nl,0.259,0.0322273292541503,30.885,196412.307,29113.257,pytorch,float32,Unquantized,Eager,No Kernel,CodeGenForCausalLM,2.298,40.0,6
8bit-bnb-eager,facebook/opt-6.7b,0.086,0.0839935989379882,10.948,100489.959,7223.648,pytorch,float16,BnB.8bit,Eager,No Kernel,OPTForCausalLM,5.439,39.08,6
8bit-bnb-fa2,facebook/opt-6.7b,0.082,0.0793855972290039,11.169,104262.084,7223.73,pytorch,float16,BnB.8bit,FAv2,No Kernel,OPTForCausalLM,5.084,39.08,6
4bit-bnb-eager,facebook/opt-6.7b,0.288,0.0505968627929687,19.8,159028.281,4334.81,pytorch,float16,BnB.4bit,Eager,No Kernel,OPTForCausalLM,3.487,39.08,6
4bit-bnb-fa2,facebook/opt-6.7b,0.279,0.046334976196289,21.382,168441.405,4334.679,pytorch,float16,BnB.4bit,FAv2,No Kernel,OPTForCausalLM,3.215,39.08,6
float32-eager,facebook/opt-6.7b,0.257,0.0233758716583251,42.745,267623.607,27312.573,pytorch,float32,Unquantized,Eager,No Kernel,OPTForCausalLM,1.73,39.08,6
float16-eager,facebook/opt-6.7b,0.03,0.0173230075836181,57.528,367011.15,13661.26,pytorch,float16,Unquantized,Eager,No Kernel,OPTForCausalLM,1.124,39.08,6
bfloat16-eager,facebook/opt-6.7b,0.029,0.0168355846405029,58.993,379198.767,13661.26,pytorch,bfloat16,Unquantized,Eager,No Kernel,OPTForCausalLM,1.095,39.08,6
float16-fa2,facebook/opt-6.7b,0.026,0.0138690557479858,72.47,463073.715,13661.255,pytorch,float16,Unquantized,FAv2,No Kernel,OPTForCausalLM,0.903,39.08,6
bfloat16-fa2,facebook/opt-6.7b,0.025,0.0133150720596313,74.635,470586.38,13661.255,pytorch,bfloat16,Unquantized,FAv2,No Kernel,OPTForCausalLM,0.866,39.08,6
4bit-gptq-exllama-v2-eager,EleutherAI/pythia-12b,0.277,0.2572042236328125,3.887,24331.385,8459.203,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,GPTNeoXForCausalLM,16.485,38.82,12
4bit-gptq-exllama-v1-eager,EleutherAI/pythia-12b,0.276,0.2561628112792968,3.9,24371.271,8459.203,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,GPTNeoXForCausalLM,16.43,38.82,12
4bit-gptq-exllama-v1-fa2,EleutherAI/pythia-12b,0.269,0.251293701171875,3.98,24888.757,8459.212,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,GPTNeoXForCausalLM,16.099,38.82,12
4bit-gptq-exllama-v2-fa2,EleutherAI/pythia-12b,0.268,0.2509199371337891,3.987,24897.149,8459.212,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,GPTNeoXForCausalLM,16.073,38.82,12
8bit-bnb-eager,EleutherAI/pythia-12b,0.098,0.0914063339233398,10.762,89600.155,13413.403,pytorch,float16,BnB.8bit,Eager,No Kernel,GPTNeoXForCausalLM,5.945,38.82,12
8bit-bnb-fa2,EleutherAI/pythia-12b,0.09,0.0880455703735351,11.187,97378.632,13415.798,pytorch,float16,BnB.8bit,FAv2,No Kernel,GPTNeoXForCausalLM,5.702,38.82,12
4bit-bnb-eager,EleutherAI/pythia-12b,0.46,0.0602019844055175,16.443,130730.615,8235.73,pytorch,float16,BnB.4bit,Eager,No Kernel,GPTNeoXForCausalLM,4.264,38.82,12
4bit-bnb-fa2,EleutherAI/pythia-12b,0.443,0.0583598098754882,16.87,133771.859,8236.778,pytorch,float16,BnB.4bit,FAv2,No Kernel,GPTNeoXForCausalLM,4.18,38.82,12
float32-eager,EleutherAI/pythia-12b,0.414,0.0398981132507324,24.858,157882.54,48751.627,pytorch,float32,Unquantized,Eager,No Kernel,GPTNeoXForCausalLM,2.947,38.82,12
float16-eager,EleutherAI/pythia-12b,0.05,0.0286873607635498,34.549,196233.439,24655.994,pytorch,float16,Unquantized,Eager,No Kernel,GPTNeoXForCausalLM,1.863,38.82,12
bfloat16-eager,EleutherAI/pythia-12b,0.051,0.0285890560150146,34.83,198155.967,24655.994,pytorch,bfloat16,Unquantized,Eager,No Kernel,GPTNeoXForCausalLM,1.855,38.82,12
8bit-bnb-eager,Qwen/Qwen1.5-0.5B,0.08,0.0787793884277343,12.605,114219.704,1096.74,pytorch,float16,BnB.8bit,Eager,No Kernel,Qwen2ForCausalLM,5.081,38.62,0
8bit-bnb-fa2,Qwen/Qwen1.5-0.5B,0.079,0.0788326416015625,12.617,114250.058,1096.74,pytorch,float16,BnB.8bit,FAv2,No Kernel,Qwen2ForCausalLM,5.072,38.62,0
8bit-bnb-sdpa,Qwen/Qwen1.5-0.5B,0.078,0.0770662384033203,12.909,115418.922,1096.74,pytorch,float16,BnB.8bit,SDPA,No Kernel,Qwen2ForCausalLM,4.96,38.62,0
4bit-bnb-eager,Qwen/Qwen1.5-0.5B,0.061,0.0485150718688964,20.232,183674.306,943.535,pytorch,float16,BnB.4bit,Eager,No Kernel,Qwen2ForCausalLM,3.137,38.62,0
4bit-gptq-exllama-v2-fa2,Qwen/Qwen1.5-0.5B,0.05,0.0493352966308593,20.483,176558.827,943.923,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,Qwen2ForCausalLM,3.111,38.62,0
4bit-bnb-fa2,Qwen/Qwen1.5-0.5B,0.057,0.0466257934570312,20.996,196043.867,943.535,pytorch,float16,BnB.4bit,FAv2,No Kernel,Qwen2ForCausalLM,2.992,38.62,0
4bit-gptq-exllama-v1-sdpa,Qwen/Qwen1.5-0.5B,0.048,0.0474982414245605,21.121,181895.806,943.923,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV1,Qwen2ForCausalLM,3.037,38.62,0
4bit-gptq-exllama-v1-fa2,Qwen/Qwen1.5-0.5B,0.048,0.0471377906799316,21.164,177193.053,943.923,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Qwen2ForCausalLM,3.016,38.62,0
4bit-gptq-exllama-v1-eager,Qwen/Qwen1.5-0.5B,0.048,0.0470497283935546,21.184,177138.897,943.923,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Qwen2ForCausalLM,3.017,38.62,0
4bit-gptq-exllama-v2-eager,Qwen/Qwen1.5-0.5B,0.048,0.0469975051879882,21.241,176911.57,943.923,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Qwen2ForCausalLM,3.006,38.62,0
4bit-gptq-exllama-v2-sdpa,Qwen/Qwen1.5-0.5B,0.046,0.0457164802551269,21.693,183335.303,943.923,pytorch,float16,GPTQ.4bit,SDPA,GPTQ.ExllamaV2,Qwen2ForCausalLM,2.93,38.62,0
4bit-bnb-sdpa,Qwen/Qwen1.5-0.5B,0.056,0.0447283210754394,22.236,200195.425,943.535,pytorch,float16,BnB.4bit,SDPA,No Kernel,Qwen2ForCausalLM,2.892,38.62,0
bfloat16-eager,Qwen/Qwen1.5-0.5B,0.021,0.0204687366485595,46.827,409237.923,1426.272,pytorch,bfloat16,Unquantized,Eager,No Kernel,Qwen2ForCausalLM,1.313,38.62,0
bfloat16-fa2,Qwen/Qwen1.5-0.5B,0.021,0.0206673927307128,48.387,410697.889,1426.272,pytorch,bfloat16,Unquantized,FAv2,No Kernel,Qwen2ForCausalLM,1.325,38.62,0
float16-fa2,Qwen/Qwen1.5-0.5B,0.021,0.0202301445007324,48.979,408205.709,1426.272,pytorch,float16,Unquantized,FAv2,No Kernel,Qwen2ForCausalLM,1.297,38.62,0
float16-eager,Qwen/Qwen1.5-0.5B,0.021,0.0199096317291259,49.469,416151.969,1426.272,pytorch,float16,Unquantized,Eager,No Kernel,Qwen2ForCausalLM,1.276,38.62,0
float32-eager,Qwen/Qwen1.5-0.5B,0.026,0.0194457607269287,51.953,445261.413,2600.839,pytorch,float32,Unquantized,Eager,No Kernel,Qwen2ForCausalLM,1.251,38.62,0
float16-sdpa,Qwen/Qwen1.5-0.5B,0.019,0.0184360961914062,53.969,444305.757,1426.272,pytorch,float16,Unquantized,SDPA,No Kernel,Qwen2ForCausalLM,1.181,38.62,0
bfloat16-sdpa,Qwen/Qwen1.5-0.5B,0.018,0.0182169609069824,54.749,452141.343,1426.272,pytorch,bfloat16,Unquantized,SDPA,No Kernel,Qwen2ForCausalLM,1.167,38.62,0
float32-sdpa,Qwen/Qwen1.5-0.5B,0.024,0.0169492473602294,58.642,474112.7,2600.839,pytorch,float32,Unquantized,SDPA,No Kernel,Qwen2ForCausalLM,1.092,38.62,0
4bit-gptq-exllama-v2-eager,EleutherAI/pythia-6.7b,0.164,0.1514475555419921,6.599,41202.814,5239.773,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Unknown,9.704,38.06,6
4bit-gptq-exllama-v1-eager,EleutherAI/pythia-6.7b,0.163,0.1503180847167968,6.64,41349.32,5239.773,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Unknown,9.642,38.06,6
4bit-gptq-exllama-v2-fa2,EleutherAI/pythia-6.7b,0.157,0.1464412231445312,6.835,42738.233,5239.772,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,Unknown,9.376,38.06,6
4bit-gptq-exllama-v1-fa2,EleutherAI/pythia-6.7b,0.157,0.1463849029541015,6.838,42772.663,5239.772,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Unknown,9.372,38.06,6
8bit-bnb-eager,EleutherAI/pythia-6.7b,0.073,0.0713328628540039,13.956,118077.572,8000.245,pytorch,float16,BnB.8bit,Eager,No Kernel,Unknown,4.584,38.06,6
8bit-bnb-fa2,EleutherAI/pythia-6.7b,0.068,0.0687493133544921,14.493,125061.836,8002.259,pytorch,float16,BnB.8bit,FAv2,No Kernel,Unknown,4.413,38.06,6
4bit-bnb-eager,EleutherAI/pythia-6.7b,0.28,0.0504422416687011,19.438,160030.209,5084.626,pytorch,float16,BnB.4bit,Eager,No Kernel,Unknown,3.493,38.06,6
4bit-bnb-fa2,EleutherAI/pythia-6.7b,0.269,0.0449669113159179,22.172,178628.43,5084.625,pytorch,float16,BnB.4bit,FAv2,No Kernel,Unknown,3.106,38.06,6
8bit-bnb-eager,EleutherAI/pythia-2.7b,0.082,0.0814673919677734,12.123,109438.11,3631.826,pytorch,float16,BnB.8bit,Eager,No Kernel,Unknown,5.259,37.09,2
8bit-bnb-fa2,EleutherAI/pythia-2.7b,0.074,0.073444351196289,13.383,117673.854,3632.818,pytorch,float16,BnB.8bit,FAv2,No Kernel,Unknown,4.743,37.09,2
4bit-gptq-exllama-v2-eager,EleutherAI/pythia-2.7b,0.079,0.0713359375,13.983,88336.712,2494.102,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Unknown,4.577,37.09,2
4bit-gptq-exllama-v1-eager,EleutherAI/pythia-2.7b,0.078,0.0711086044311523,14.034,88575.102,2494.102,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Unknown,4.561,37.09,2
4bit-gptq-exllama-v1-fa2,EleutherAI/pythia-2.7b,0.07,0.0636241912841796,15.69,98174.172,2494.1,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Unknown,4.085,37.09,2
4bit-gptq-exllama-v2-fa2,EleutherAI/pythia-2.7b,0.07,0.0634490890502929,15.714,98487.865,2494.1,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,Unknown,4.076,37.09,2
4bit-bnb-eager,EleutherAI/pythia-2.7b,0.125,0.0529807357788085,18.687,157320.029,2358.103,pytorch,float16,BnB.4bit,Eager,No Kernel,Unknown,3.507,37.09,2
4bit-bnb-fa2,EleutherAI/pythia-2.7b,0.114,0.0483450889587402,20.16,174315.707,2358.103,pytorch,float16,BnB.4bit,FAv2,No Kernel,Unknown,3.218,37.09,2
8bit-bnb-fa2,facebook/opt-2.7b,0.079,0.0791275482177734,10.881,63573.38,3080.719,pytorch,float16,BnB.8bit,FAv2,No Kernel,OPTForCausalLM,5.062,36.74,2
8bit-bnb-eager,facebook/opt-2.7b,0.085,0.0831027221679687,11.997,102075.651,3079.772,pytorch,float16,BnB.8bit,Eager,No Kernel,OPTForCausalLM,5.338,36.74,2
4bit-bnb-eager,facebook/opt-2.7b,0.122,0.0488171501159667,20.457,170561.982,1840.677,pytorch,float16,BnB.4bit,Eager,No Kernel,OPTForCausalLM,3.203,36.74,2
4bit-bnb-fa2,facebook/opt-2.7b,0.114,0.0462039031982421,21.547,154257.192,1840.546,pytorch,float16,BnB.4bit,FAv2,No Kernel,OPTForCausalLM,3.029,36.74,2
float16-eager,facebook/opt-2.7b,0.019,0.0166922245025634,57.574,314407.304,5540.556,pytorch,float16,Unquantized,Eager,No Kernel,OPTForCausalLM,1.088,36.74,2
bfloat16-eager,facebook/opt-2.7b,0.019,0.0163471355438232,61.368,451310.36,5540.556,pytorch,bfloat16,Unquantized,Eager,No Kernel,OPTForCausalLM,1.052,36.74,2
float32-eager,facebook/opt-2.7b,0.104,0.0163532791137695,61.369,419443.59,11168.211,pytorch,float32,Unquantized,Eager,No Kernel,OPTForCausalLM,1.131,36.74,2
float16-fa2,facebook/opt-2.7b,0.014,0.0134553604125976,74.002,526915.438,5540.548,pytorch,float16,Unquantized,FAv2,No Kernel,OPTForCausalLM,0.864,36.74,2
bfloat16-fa2,facebook/opt-2.7b,0.014,0.0131655683517456,75.814,526560.825,5540.548,pytorch,bfloat16,Unquantized,FAv2,No Kernel,OPTForCausalLM,0.844,36.74,2
4bit-bnb-eager,facebook/xglm-7.5B,0.289,0.0512839698791503,19.234,147550.761,6018.104,pytorch,float16,BnB.4bit,Eager,No Kernel,XGLMForCausalLM,3.531,36.38,7
float32-eager,facebook/xglm-7.5B,0.283,0.0253132801055908,39.342,241439.149,30815.491,pytorch,float32,Unquantized,Eager,No Kernel,XGLMForCausalLM,1.88,36.38,7
float16-eager,facebook/xglm-7.5B,0.033,0.0182353916168212,54.67,346520.776,15412.54,pytorch,float16,Unquantized,Eager,No Kernel,XGLMForCausalLM,1.185,36.38,7
bfloat16-eager,facebook/xglm-7.5B,0.032,0.0177520637512207,55.846,351184.551,15412.54,pytorch,bfloat16,Unquantized,Eager,No Kernel,XGLMForCausalLM,1.159,36.38,7
8bit-bnb-fa2,EleutherAI/gpt-neo-2.7B,0.095,0.0941107177734375,10.445,92094.012,3211.978,pytorch,float16,BnB.8bit,FAv2,No Kernel,GPTNeoForCausalLM,6.123,36.2,2
8bit-bnb-eager,EleutherAI/gpt-neo-2.7B,0.093,0.0917739486694336,10.858,94364.895,3216.625,pytorch,float16,BnB.8bit,Eager,No Kernel,GPTNeoForCausalLM,5.891,36.2,2
4bit-gptq-exllama-v1-eager,EleutherAI/gpt-neo-2.7B,0.08,0.0716933135986328,13.899,88974.602,2079.903,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,GPTNeoForCausalLM,4.603,36.2,2
4bit-gptq-exllama-v2-eager,EleutherAI/gpt-neo-2.7B,0.079,0.071201789855957,14.009,88628.153,2079.903,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,GPTNeoForCausalLM,4.568,36.2,2
4bit-gptq-exllama-v1-fa2,EleutherAI/gpt-neo-2.7B,0.07,0.0631685104370117,15.815,98451.236,2079.897,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,GPTNeoForCausalLM,4.051,36.2,2
4bit-gptq-exllama-v2-fa2,EleutherAI/gpt-neo-2.7B,0.07,0.0629032974243164,15.881,98943.3,2079.897,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,GPTNeoForCausalLM,4.034,36.2,2
4bit-bnb-eager,EleutherAI/gpt-neo-2.7B,0.122,0.0542955513000488,18.227,156511.963,1986.218,pytorch,float16,BnB.4bit,Eager,No Kernel,GPTNeoForCausalLM,3.557,36.2,2
4bit-bnb-fa2,EleutherAI/gpt-neo-2.7B,0.112,0.0513587188720703,19.02,159217.192,1986.087,pytorch,float16,BnB.4bit,FAv2,No Kernel,GPTNeoForCausalLM,3.44,36.2,2
bfloat16-eager,EleutherAI/gpt-neo-2.7B,0.022,0.0210319366455078,47.132,355023.389,5677.722,pytorch,bfloat16,Unquantized,Eager,No Kernel,GPTNeoForCausalLM,1.351,36.2,2
float16-eager,EleutherAI/gpt-neo-2.7B,0.022,0.020853759765625,47.851,360554.821,5677.722,pytorch,float16,Unquantized,Eager,No Kernel,GPTNeoForCausalLM,1.335,36.2,2
float32-eager,EleutherAI/gpt-neo-2.7B,0.103,0.0190218238830566,52.437,359668.741,11304.033,pytorch,float32,Unquantized,Eager,No Kernel,GPTNeoForCausalLM,1.301,36.2,2
float16-fa2,EleutherAI/gpt-neo-2.7B,0.017,0.0169615364074707,58.707,429652.396,5675.077,pytorch,float16,Unquantized,FAv2,No Kernel,GPTNeoForCausalLM,1.089,36.2,2
bfloat16-fa2,EleutherAI/gpt-neo-2.7B,0.017,0.0167864322662353,58.838,441408.61,5675.077,pytorch,bfloat16,Unquantized,FAv2,No Kernel,GPTNeoForCausalLM,1.075,36.2,2
bfloat16-eager,microsoft/rho-math-1b-v0.1,0.024,0.0233349113464355,42.896,352872.568,2279.027,pytorch,bfloat16,Unquantized,Eager,No Kernel,LlamaForCausalLM,1.495,34.99,1
float16-eager,microsoft/rho-math-1b-v0.1,0.024,0.0220405769348144,45.158,355023.905,2279.42,pytorch,float16,Unquantized,Eager,No Kernel,LlamaForCausalLM,1.413,34.99,1
float32-eager,microsoft/rho-math-1b-v0.1,0.044,0.0200570888519287,49.071,366733.618,4492.921,pytorch,float32,Unquantized,Eager,No Kernel,LlamaForCausalLM,1.308,34.99,1
float32-sdpa,microsoft/rho-math-1b-v0.1,0.042,0.0181954555511474,54.752,408059.01,4492.869,pytorch,float32,Unquantized,SDPA,No Kernel,LlamaForCausalLM,1.19,34.99,1
8bit-bnb-eager,EleutherAI/pythia-1.4b,0.061,0.0587837448120117,16.777,146210.57,2004.071,pytorch,float16,BnB.8bit,Eager,No Kernel,GPTNeoXForCausalLM,3.79,34.75,1
8bit-bnb-fa2,EleutherAI/pythia-1.4b,0.055,0.055150592803955,17.844,154626.702,1999.766,pytorch,float16,BnB.8bit,FAv2,No Kernel,GPTNeoXForCausalLM,3.59,34.75,1
4bit-bnb-eager,EleutherAI/pythia-1.4b,0.067,0.0400046081542968,24.431,215851.465,1406.95,pytorch,float16,BnB.4bit,Eager,No Kernel,GPTNeoXForCausalLM,2.655,34.75,1
4bit-gptq-exllama-v2-eager,EleutherAI/pythia-1.4b,0.045,0.0405729293823242,24.598,161068.566,1491.577,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,GPTNeoXForCausalLM,2.601,34.75,1
4bit-gptq-exllama-v1-eager,EleutherAI/pythia-1.4b,0.044,0.0403845138549804,24.708,163614.962,1491.577,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,GPTNeoXForCausalLM,2.589,34.75,1
4bit-bnb-fa2,EleutherAI/pythia-1.4b,0.058,0.0371742706298828,26.269,230222.307,1406.95,pytorch,float16,BnB.4bit,FAv2,No Kernel,GPTNeoXForCausalLM,2.461,34.75,1
4bit-gptq-exllama-v2-fa2,EleutherAI/pythia-1.4b,0.037,0.033570816040039,29.737,188311.256,1491.576,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,GPTNeoXForCausalLM,2.153,34.75,1
4bit-gptq-exllama-v1-fa2,EleutherAI/pythia-1.4b,0.037,0.03340185546875,29.916,188381.362,1491.576,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,GPTNeoXForCausalLM,2.142,34.75,1
bfloat16-eager,EleutherAI/pythia-1.4b,0.022,0.02065305519104,46.501,420375.655,3188.153,pytorch,bfloat16,Unquantized,Eager,No Kernel,GPTNeoXForCausalLM,1.372,34.75,1
float16-eager,EleutherAI/pythia-1.4b,0.02,0.0190382080078125,52.361,401349.858,3188.153,pytorch,float16,Unquantized,Eager,No Kernel,GPTNeoXForCausalLM,1.22,34.75,1
bfloat16-fa2,EleutherAI/pythia-1.4b,0.018,0.0176005115509033,54.601,479466.127,3189.192,pytorch,bfloat16,Unquantized,FAv2,No Kernel,GPTNeoXForCausalLM,1.166,34.75,1
float32-eager,EleutherAI/pythia-1.4b,0.053,0.0180305919647216,55.097,427059.397,6138.652,pytorch,float32,Unquantized,Eager,No Kernel,GPTNeoXForCausalLM,1.195,34.75,1
float16-fa2,EleutherAI/pythia-1.4b,0.017,0.0165365753173828,59.578,441552.856,3189.192,pytorch,float16,Unquantized,FAv2,No Kernel,GPTNeoXForCausalLM,1.064,34.75,1
8bit-bnb-eager,EleutherAI/pythia-1.3b,0.058,0.0571002883911132,17.3,153607.944,2004.071,pytorch,float16,BnB.8bit,Eager,No Kernel,Unknown,3.668,34.46,1
8bit-bnb-fa2,EleutherAI/pythia-1.3b,0.056,0.0556001281738281,17.781,160446.358,1999.766,pytorch,float16,BnB.8bit,FAv2,No Kernel,Unknown,3.615,34.46,1
4bit-gptq-exllama-v1-eager,EleutherAI/pythia-1.3b,0.045,0.0406988792419433,24.471,162898.334,1491.577,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Unknown,2.619,34.46,1
4bit-gptq-exllama-v2-eager,EleutherAI/pythia-1.3b,0.045,0.0406333427429199,24.504,162882.971,1491.577,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Unknown,2.609,34.46,1
4bit-bnb-eager,EleutherAI/pythia-1.3b,0.067,0.0396871681213378,24.768,217022.229,1406.95,pytorch,float16,BnB.4bit,Eager,No Kernel,Unknown,2.59,34.46,1
4bit-bnb-fa2,EleutherAI/pythia-1.3b,0.057,0.0347484169006347,28.298,237695.398,1406.95,pytorch,float16,BnB.4bit,FAv2,No Kernel,Unknown,2.277,34.46,1
4bit-gptq-exllama-v2-fa2,EleutherAI/pythia-1.3b,0.037,0.0334622726440429,29.676,187372.614,1491.576,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,Unknown,2.152,34.46,1
4bit-gptq-exllama-v1-fa2,EleutherAI/pythia-1.3b,0.037,0.0334131202697753,29.839,187598.604,1491.576,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Unknown,2.144,34.46,1
bfloat16-eager,EleutherAI/pythia-1.3b,0.019,0.0182722568511962,54.429,438010.527,3188.153,pytorch,bfloat16,Unquantized,Eager,No Kernel,Unknown,1.174,34.46,1
float16-eager,EleutherAI/pythia-1.3b,0.019,0.0182794246673584,54.462,436015.522,3188.153,pytorch,float16,Unquantized,Eager,No Kernel,Unknown,1.171,34.46,1
float32-eager,EleutherAI/pythia-1.3b,0.053,0.0171397113800048,58.014,435230.688,6138.652,pytorch,float32,Unquantized,Eager,No Kernel,Unknown,1.133,34.46,1
bfloat16-fa2,EleutherAI/pythia-1.3b,0.017,0.016449535369873,59.87,481869.771,3189.192,pytorch,bfloat16,Unquantized,FAv2,No Kernel,Unknown,1.069,34.46,1
float16-fa2,EleutherAI/pythia-1.3b,0.016,0.0157440004348754,63.345,487602.529,3189.192,pytorch,float16,Unquantized,FAv2,No Kernel,Unknown,1.011,34.46,1
float32-eager,facebook/xglm-4.5B,0.168,0.0249886722564697,39.254,273440.39,18903.143,pytorch,float32,Unquantized,Eager,No Kernel,XGLMForCausalLM,1.763,34.31,5
float16-eager,facebook/xglm-4.5B,0.03,0.0255989761352539,39.38,294573.831,9490.407,pytorch,float16,Unquantized,Eager,No Kernel,XGLMForCausalLM,1.644,34.31,5
bfloat16-eager,facebook/xglm-4.5B,0.028,0.0238663673400878,41.593,282201.582,9490.407,pytorch,bfloat16,Unquantized,Eager,No Kernel,XGLMForCausalLM,1.533,34.31,5
8bit-bnb-eager,EleutherAI/gpt-neo-1.3B,0.07,0.0691435546875,14.374,124044.621,1668.145,pytorch,float16,BnB.8bit,Eager,No Kernel,GPTNeoForCausalLM,4.455,33.58,1
8bit-bnb-fa2,EleutherAI/gpt-neo-1.3B,0.066,0.0653578262329101,15.18,131650.44,1666.097,pytorch,float16,BnB.8bit,FAv2,No Kernel,GPTNeoForCausalLM,4.196,33.58,1
4bit-gptq-exllama-v2-eager,EleutherAI/gpt-neo-1.3B,0.047,0.0414115829467773,23.973,157244.726,1168.485,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,GPTNeoForCausalLM,2.66,33.58,1
4bit-gptq-exllama-v1-eager,EleutherAI/gpt-neo-1.3B,0.047,0.0414136314392089,24.035,161898.336,1168.485,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,GPTNeoForCausalLM,2.662,33.58,1
4bit-bnb-eager,EleutherAI/gpt-neo-1.3B,0.067,0.0408975372314453,24.039,206590.25,1117.748,pytorch,float16,BnB.4bit,Eager,No Kernel,GPTNeoForCausalLM,2.674,33.58,1
4bit-gptq-exllama-v2-fa2,EleutherAI/gpt-neo-1.3B,0.038,0.0374609909057617,26.4,174804.699,1168.484,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,GPTNeoForCausalLM,2.42,33.58,1
4bit-bnb-fa2,EleutherAI/gpt-neo-1.3B,0.057,0.0374231033325195,26.5,227909.248,1117.617,pytorch,float16,BnB.4bit,FAv2,No Kernel,GPTNeoForCausalLM,2.42,33.58,1
4bit-gptq-exllama-v1-fa2,EleutherAI/gpt-neo-1.3B,0.038,0.0371292152404785,26.834,175035.933,1168.484,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,GPTNeoForCausalLM,2.378,33.58,1
float16-eager,EleutherAI/gpt-neo-1.3B,0.017,0.0154593276977539,64.567,482897.311,2885.485,pytorch,float16,Unquantized,Eager,No Kernel,GPTNeoForCausalLM,0.992,33.58,1
bfloat16-eager,EleutherAI/gpt-neo-1.3B,0.017,0.015388671875,64.702,492113.382,2885.485,pytorch,bfloat16,Unquantized,Eager,No Kernel,GPTNeoForCausalLM,0.985,33.58,1
float32-eager,EleutherAI/gpt-neo-1.3B,0.053,0.0148541440963745,67.233,506216.396,5626.042,pytorch,float32,Unquantized,Eager,No Kernel,GPTNeoForCausalLM,0.986,33.58,1
float16-fa2,EleutherAI/gpt-neo-1.3B,0.013,0.0125967359542846,78.565,607014.857,2884.409,pytorch,float16,Unquantized,FAv2,No Kernel,GPTNeoForCausalLM,0.808,33.58,1
bfloat16-fa2,EleutherAI/gpt-neo-1.3B,0.013,0.0125788164138793,79.193,602191.64,2884.409,pytorch,bfloat16,Unquantized,FAv2,No Kernel,GPTNeoForCausalLM,0.806,33.58,1
4bit-gptq-exllama-v1-eager,EleutherAI/polyglot-ko-12.8b,0.308,0.2859171752929687,3.496,21859.014,8808.924,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,GPTNeoXForCausalLM,18.322,33.33,13
4bit-gptq-exllama-v2-eager,EleutherAI/polyglot-ko-12.8b,0.308,0.2855116882324219,3.501,21895.97,8809.118,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,GPTNeoXForCausalLM,18.299,33.33,13
4bit-gptq-exllama-v2-fa2,EleutherAI/polyglot-ko-12.8b,0.3,0.279900146484375,3.575,22348.191,8808.933,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,GPTNeoXForCausalLM,17.932,33.33,13
4bit-gptq-exllama-v1-fa2,EleutherAI/polyglot-ko-12.8b,0.299,0.2790911865234375,3.581,22362.209,8808.933,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,GPTNeoXForCausalLM,17.886,33.33,13
8bit-bnb-fa2,EleutherAI/polyglot-ko-12.8b,0.094,0.0925911026000976,10.652,89780.254,14373.646,pytorch,float16,BnB.8bit,FAv2,No Kernel,GPTNeoXForCausalLM,5.998,33.33,13
8bit-bnb-eager,EleutherAI/polyglot-ko-12.8b,0.094,0.0915855331420898,10.911,93144.898,14370.872,pytorch,float16,BnB.8bit,Eager,No Kernel,GPTNeoXForCausalLM,5.878,33.33,13
4bit-bnb-eager,EleutherAI/polyglot-ko-12.8b,0.505,0.0626718711853027,15.64,120462.207,8556.998,pytorch,float16,BnB.4bit,Eager,No Kernel,GPTNeoXForCausalLM,4.526,33.33,13
4bit-bnb-fa2,EleutherAI/polyglot-ko-12.8b,0.491,0.0563199996948242,17.643,131995.089,8556.997,pytorch,float16,BnB.4bit,FAv2,No Kernel,GPTNeoXForCausalLM,4.06,33.33,13
float32-eager,EleutherAI/polyglot-ko-12.8b,0.454,0.0427304954528808,23.352,145631.002,53088.639,pytorch,float32,Unquantized,Eager,No Kernel,GPTNeoXForCausalLM,3.147,33.33,13
bfloat16-eager,EleutherAI/polyglot-ko-12.8b,0.057,0.0305438728332519,32.533,205417.545,26864.106,pytorch,bfloat16,Unquantized,Eager,No Kernel,GPTNeoXForCausalLM,1.984,33.33,13
float16-eager,EleutherAI/polyglot-ko-12.8b,0.056,0.0305868797302246,32.546,203009.187,26864.106,pytorch,float16,Unquantized,Eager,No Kernel,GPTNeoXForCausalLM,1.986,33.33,13
bfloat16-fa2,EleutherAI/polyglot-ko-12.8b,0.05,0.0256614398956298,38.823,243811.333,26866.388,pytorch,bfloat16,Unquantized,FAv2,No Kernel,GPTNeoXForCausalLM,1.671,33.33,13
float16-fa2,EleutherAI/polyglot-ko-12.8b,0.049,0.0256194553375244,38.83,241693.597,26866.388,pytorch,float16,Unquantized,FAv2,No Kernel,GPTNeoXForCausalLM,1.668,33.33,13
8bit-bnb-eager,EleutherAI/pythia-410m,0.058,0.0563548164367675,17.422,154115.627,771.487,pytorch,float16,BnB.8bit,Eager,No Kernel,GPTNeoXForCausalLM,3.689,31.55,0
8bit-bnb-fa2,EleutherAI/pythia-410m,0.053,0.0530503692626953,18.696,166297.453,775.103,pytorch,float16,BnB.8bit,FAv2,No Kernel,GPTNeoXForCausalLM,3.428,31.55,0
4bit-bnb-eager,EleutherAI/pythia-410m,0.045,0.0379965438842773,25.81,231240.067,622.611,pytorch,float16,BnB.4bit,Eager,No Kernel,GPTNeoXForCausalLM,2.453,31.55,0
4bit-bnb-fa2,EleutherAI/pythia-410m,0.042,0.0362158088684082,27.379,247916.895,622.924,pytorch,float16,BnB.4bit,FAv2,No Kernel,GPTNeoXForCausalLM,2.345,31.55,0
4bit-gptq-exllama-v2-eager,EleutherAI/pythia-410m,0.035,0.0342548484802246,29.117,238539.315,644.02,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,GPTNeoXForCausalLM,2.193,31.55,0
4bit-gptq-exllama-v1-eager,EleutherAI/pythia-410m,0.035,0.0340756492614746,29.258,235878.211,644.02,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,GPTNeoXForCausalLM,2.18,31.55,0
4bit-gptq-exllama-v1-fa2,EleutherAI/pythia-410m,0.032,0.031797248840332,31.358,253074.129,644.019,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,GPTNeoXForCausalLM,2.035,31.55,0
4bit-gptq-exllama-v2-fa2,EleutherAI/pythia-410m,0.032,0.0313794555664062,31.724,256472.321,644.019,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,GPTNeoXForCausalLM,2.01,31.55,0
8bit-bnb-eager,facebook/opt-350m,0.065,0.063287296295166,15.599,136832.94,446.104,pytorch,float16,BnB.8bit,Eager,No Kernel,OPTForCausalLM,4.065,30.01,0
8bit-bnb-fa2,facebook/opt-350m,0.061,0.0607703056335449,16.393,141474.474,446.104,pytorch,float16,BnB.8bit,FAv2,No Kernel,OPTForCausalLM,3.909,30.01,0
4bit-bnb-eager,facebook/opt-350m,0.048,0.0381931533813476,26.358,228702.72,298.885,pytorch,float16,BnB.4bit,Eager,No Kernel,OPTForCausalLM,2.433,30.01,0
4bit-bnb-fa2,facebook/opt-350m,0.044,0.0354037742614746,28.14,240528.176,298.884,pytorch,float16,BnB.4bit,FAv2,No Kernel,OPTForCausalLM,2.284,30.01,0
float16-eager,facebook/opt-350m,0.014,0.0123540477752685,80.154,654730.729,749.22,pytorch,float16,Unquantized,Eager,No Kernel,OPTForCausalLM,0.796,30.01,0
bfloat16-eager,facebook/opt-350m,0.013,0.0120238075256347,82.913,673691.11,749.22,pytorch,bfloat16,Unquantized,Eager,No Kernel,OPTForCausalLM,0.772,30.01,0
float32-eager,facebook/opt-350m,0.021,0.0118620157241821,84.195,507037.826,1491.807,pytorch,float32,Unquantized,Eager,No Kernel,OPTForCausalLM,0.77,30.01,0
bfloat16-fa2,facebook/opt-350m,0.011,0.0102732801437377,95.412,794640.402,749.216,pytorch,bfloat16,Unquantized,FAv2,No Kernel,OPTForCausalLM,0.669,30.01,0
float16-fa2,facebook/opt-350m,0.011,0.0100710401535034,98.814,821606.22,749.216,pytorch,float16,Unquantized,FAv2,No Kernel,OPTForCausalLM,0.647,30.01,0
float16-eager,facebook/xglm-564M,0.018,0.0141875195503234,68.478,624894.821,1324.762,pytorch,float16,Unquantized,Eager,No Kernel,XGLMForCausalLM,0.935,29.55,0
float32-eager,facebook/xglm-564M,0.028,0.0125404157638549,79.162,559873.28,2642.978,pytorch,float32,Unquantized,Eager,No Kernel,XGLMForCausalLM,0.82,29.55,0
bfloat16-eager,facebook/xglm-564M,0.014,0.0123084802627563,80.473,667864.13,1324.762,pytorch,bfloat16,Unquantized,Eager,No Kernel,XGLMForCausalLM,0.79,29.55,0
8bit-bnb-eager,EleutherAI/gpt-neo-125m,0.037,0.036387840270996,26.831,241510.078,271.428,pytorch,float16,BnB.8bit,Eager,No Kernel,GPTNeoForCausalLM,2.386,29.47,0
8bit-bnb-fa2,EleutherAI/gpt-neo-125m,0.034,0.0346511344909667,28.303,251476.298,270.703,pytorch,float16,BnB.8bit,FAv2,No Kernel,GPTNeoForCausalLM,2.265,29.47,0
4bit-bnb-eager,EleutherAI/gpt-neo-125m,0.027,0.0216524791717529,44.66,398142.414,229.949,pytorch,float16,BnB.4bit,Eager,No Kernel,GPTNeoForCausalLM,1.419,29.47,0
4bit-gptq-exllama-v2-eager,EleutherAI/gpt-neo-125m,0.023,0.0218941440582275,46.168,411474.529,242.383,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,GPTNeoForCausalLM,1.407,29.47,0
4bit-gptq-exllama-v1-eager,EleutherAI/gpt-neo-125m,0.022,0.0210493431091308,46.909,403596.387,242.383,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,GPTNeoForCausalLM,1.359,29.47,0
4bit-bnb-fa2,EleutherAI/gpt-neo-125m,0.025,0.0198512649536132,48.346,422284.363,229.306,pytorch,float16,BnB.4bit,FAv2,No Kernel,GPTNeoForCausalLM,1.31,29.47,0
4bit-gptq-exllama-v2-fa2,EleutherAI/gpt-neo-125m,0.02,0.0192552967071533,51.475,433415.328,242.382,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,GPTNeoForCausalLM,1.235,29.47,0
4bit-gptq-exllama-v1-fa2,EleutherAI/gpt-neo-125m,0.02,0.019095552444458,52.071,424968.084,242.382,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,GPTNeoForCausalLM,1.226,29.47,0
float16-eager,EleutherAI/gpt-neo-125m,0.009,0.0082959361076354,119.461,1006877.327,363.873,pytorch,float16,Unquantized,Eager,No Kernel,GPTNeoForCausalLM,0.533,29.47,0
bfloat16-eager,EleutherAI/gpt-neo-125m,0.009,0.008196096420288,121.822,1003006.109,363.873,pytorch,bfloat16,Unquantized,Eager,No Kernel,GPTNeoForCausalLM,0.526,29.47,0
float32-eager,EleutherAI/gpt-neo-125m,0.01,0.0079923200607299,124.599,1064993.47,657.858,pytorch,float32,Unquantized,Eager,No Kernel,GPTNeoForCausalLM,0.514,29.47,0
float16-fa2,EleutherAI/gpt-neo-125m,0.007,0.0067799038887023,145.789,1233644.886,363.871,pytorch,float16,Unquantized,FAv2,No Kernel,GPTNeoForCausalLM,0.436,29.47,0
bfloat16-fa2,EleutherAI/gpt-neo-125m,0.007,0.0066672639846801,149.464,1249611.184,363.871,pytorch,bfloat16,Unquantized,FAv2,No Kernel,GPTNeoForCausalLM,0.427,29.47,0
8bit-bnb-eager,facebook/opt-125m,0.032,0.0316375045776367,31.45,271342.806,220.298,pytorch,float16,BnB.8bit,Eager,No Kernel,OPTForCausalLM,2.028,29.15,0
8bit-bnb-fa2,facebook/opt-125m,0.03,0.0306268157958984,32.544,284216.679,219.642,pytorch,float16,BnB.8bit,FAv2,No Kernel,OPTForCausalLM,1.962,29.15,0
4bit-bnb-eager,facebook/opt-125m,0.025,0.0195266551971435,50.783,444506.087,178.504,pytorch,float16,BnB.4bit,Eager,No Kernel,OPTForCausalLM,1.27,29.15,0
4bit-bnb-fa2,facebook/opt-125m,0.023,0.0185784320831298,53.029,481425.024,178.503,pytorch,float16,BnB.4bit,FAv2,No Kernel,OPTForCausalLM,1.199,29.15,0
bfloat16-fa2,facebook/opt-125m,0.012,0.0114432001113891,87.225,1088969.19,312.411,pytorch,bfloat16,Unquantized,FAv2,No Kernel,OPTForCausalLM,0.74,29.15,0
float16-eager,facebook/opt-125m,0.008,0.0068095998764038,144.454,892300.781,312.416,pytorch,float16,Unquantized,Eager,No Kernel,OPTForCausalLM,0.437,29.15,0
float32-eager,facebook/opt-125m,0.009,0.006565887928009,151.239,838505.545,601.352,pytorch,float32,Unquantized,Eager,No Kernel,OPTForCausalLM,0.423,29.15,0
bfloat16-eager,facebook/opt-125m,0.007,0.0065126399993896,153.138,1254978.241,312.416,pytorch,bfloat16,Unquantized,Eager,No Kernel,OPTForCausalLM,0.418,29.15,0
float16-fa2,facebook/opt-125m,0.006,0.0058081278800964,169.275,1492424.358,312.411,pytorch,float16,Unquantized,FAv2,No Kernel,OPTForCausalLM,0.372,29.15,0
8bit-bnb-eager,EleutherAI/pythia-160m,0.031,0.0315013122558593,31.118,269934.929,376.32,pytorch,float16,BnB.8bit,Eager,No Kernel,GPTNeoXForCausalLM,2.031,29.02,0
8bit-bnb-fa2,EleutherAI/pythia-160m,0.031,0.0284067840576171,34.413,304622.847,375.329,pytorch,float16,BnB.8bit,FAv2,No Kernel,GPTNeoXForCausalLM,1.842,29.02,0
4bit-bnb-eager,EleutherAI/pythia-160m,0.024,0.0205936641693115,47.497,416727.952,327.544,pytorch,float16,BnB.4bit,Eager,No Kernel,GPTNeoXForCausalLM,1.355,29.02,0
4bit-gptq-exllama-v2-eager,EleutherAI/pythia-160m,0.02,0.0202199039459228,48.236,414443.1,340.903,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,GPTNeoXForCausalLM,1.299,29.02,0
4bit-gptq-exllama-v1-eager,EleutherAI/pythia-160m,0.021,0.0198952960968017,50.093,432007.015,340.903,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,GPTNeoXForCausalLM,1.265,29.02,0
4bit-gptq-exllama-v1-fa2,EleutherAI/pythia-160m,0.021,0.0189532165527343,51.969,456213.069,340.902,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,GPTNeoXForCausalLM,1.237,29.02,0
4bit-bnb-fa2,EleutherAI/pythia-160m,0.022,0.0183429126739501,53.354,469893.169,327.815,pytorch,float16,BnB.4bit,FAv2,No Kernel,GPTNeoXForCausalLM,1.191,29.02,0
4bit-gptq-exllama-v2-fa2,EleutherAI/pythia-160m,0.017,0.0163215351104736,60.986,506696.229,340.902,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,GPTNeoXForCausalLM,1.046,29.02,0
8bit-bnb-fa2,EleutherAI/pythia-70m,0.015,0.0151193599700927,63.768,607650.688,197.947,pytorch,float16,BnB.8bit,FAv2,No Kernel,Unknown,0.999,28.93,0
8bit-bnb-eager,EleutherAI/pythia-70m,0.015,0.0149831676483154,65.775,618526.295,197.768,pytorch,float16,BnB.8bit,Eager,No Kernel,Unknown,0.976,28.93,0
4bit-bnb-eager,EleutherAI/pythia-70m,0.012,0.0100229120254516,99.119,862794.177,188.553,pytorch,float16,BnB.4bit,Eager,No Kernel,Unknown,0.644,28.93,0
4bit-bnb-fa2,EleutherAI/pythia-70m,0.012,0.0097966079711914,99.976,851225.469,188.745,pytorch,float16,BnB.4bit,FAv2,No Kernel,Unknown,0.637,28.93,0
4bit-gptq-exllama-v1-eager,EleutherAI/pythia-70m,0.01,0.0094412803649902,104.628,876577.936,193.845,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV1,Unknown,0.609,28.93,0
4bit-gptq-exllama-v2-eager,EleutherAI/pythia-70m,0.01,0.0093061122894287,107.004,914642.949,193.845,pytorch,float16,GPTQ.4bit,Eager,GPTQ.ExllamaV2,Unknown,0.595,28.93,0
4bit-gptq-exllama-v2-fa2,EleutherAI/pythia-70m,0.009,0.0086067199707031,115.952,991460.772,193.844,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV2,Unknown,0.551,28.93,0
4bit-gptq-exllama-v1-fa2,EleutherAI/pythia-70m,0.009,0.0085657596588134,116.298,998412.513,193.844,pytorch,float16,GPTQ.4bit,FAv2,GPTQ.ExllamaV1,Unknown,0.549,28.93,0
float16-fa2,openai-community/gpt2,0.007,0.0067194881439208,147.371,1212590.352,328.799,pytorch,float16,Unquantized,FAv2,No Kernel,GPT2LMHeadModel,0.432,28.53,0