dahara1 commited on
Commit
fe212b7
1 Parent(s): 4065b4d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -67,7 +67,8 @@ Also, the score may change as a result of more tuning.
67
 
68
  * **Japanese benchmark**
69
 
70
- - *We used [Stability-AI/lm-evaluation-harness + gptq patch](https://github.com/webbigdata-jp/lm-evaluation-harness) for evaluation.*
 
71
  - *The 4-task average accuracy is based on results of JCommonsenseQA-1.1, JNLI-1.1, MARC-ja-1.1, and JSQuAD-1.1.*
72
  - *model loading is performed with gptq_use_triton=True, and evaluation is performed with template version 0.3 using the few-shot in-context learning.*
73
  - *The number of few-shots is 3,3,3,2.*
 
67
 
68
  * **Japanese benchmark**
69
 
70
+ - *We used [patched Stability-AI/lm-evaluation-harness](https://github.com/webbigdata-jp/lm-evaluation-harness) for evaluation.*
71
+ - [Stability-AI/lm-evaluation-harness](https://github.com/webbigdata-jp/lm-evaluation-harness) + [gakada's AutoGPTQ path](https://github.com/EleutherAI/lm-evaluation-harness/pull/519)*
72
  - *The 4-task average accuracy is based on results of JCommonsenseQA-1.1, JNLI-1.1, MARC-ja-1.1, and JSQuAD-1.1.*
73
  - *model loading is performed with gptq_use_triton=True, and evaluation is performed with template version 0.3 using the few-shot in-context learning.*
74
  - *The number of few-shots is 3,3,3,2.*