Adding Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +118 -13

README.md CHANGED Viewed

@@ -1,18 +1,18 @@
 ---
-license: other
-datasets:
-- OpenAssistant/oasst2
-- nvidia/HelpSteer
 language:
 - ja
 - en
 library_name: transformers
-base_model: karakuri-ai/karakuri-lm-70b-v0.1
-pipeline_tag: conversational
 tags:
 - llama
 - llama-2
 - steerlm
 model-index:
 - name: karakuri-ai/karakuri-lm-70b-chat-v0.1
   results:
@@ -24,22 +24,113 @@ model-index:
       type: unknown
     metrics:
     - type: unknown
-      name: score
       value: 6.609375
     source:
       url: https://huggingface.co/spaces/lmsys/mt-bench
   - task:
       type: text-generation
       name: Text Generation
     dataset:
-      name: MT-Bench-jp
-      type: unknown
     metrics:
-    - type: unknown
-      name: score
-      value: 6.43125
     source:
-      url: https://api.wandb.ai/links/wandb-japan/6ff86bp3
 ---
 # KARAKURI LM
@@ -169,3 +260,17 @@ Subject to the license above, and except for commercial purposes, you are free t
 If you plan to use KARAKURI LM for commercial purposes, please contact us beforehand. You are not authorized to use KARAKURI LM for commercial purposes unless we expressly grant you such rights.
 If you have any questions regarding the interpretation of above terms, please also feel free to contact us.

 ---
 language:
 - ja
 - en
+license: other
 library_name: transformers
 tags:
 - llama
 - llama-2
 - steerlm
+datasets:
+- OpenAssistant/oasst2
+- nvidia/HelpSteer
+base_model: karakuri-ai/karakuri-lm-70b-v0.1
+pipeline_tag: conversational
 model-index:
 - name: karakuri-ai/karakuri-lm-70b-chat-v0.1
   results:
       type: unknown
     metrics:
     - type: unknown
       value: 6.609375
+      name: score
+    - type: unknown
+      value: 6.43125
+      name: score
     source:
       url: https://huggingface.co/spaces/lmsys/mt-bench
   - task:
       type: text-generation
       name: Text Generation
     dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+      args:
+        num_few_shot: 25
     metrics:
+    - type: acc_norm
+      value: 61.52
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=karakuri-ai/karakuri-lm-70b-chat-v0.1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HellaSwag (10-Shot)
+      type: hellaswag
+      split: validation
+      args:
+        num_few_shot: 10
+    metrics:
+    - type: acc_norm
+      value: 83.13
+      name: normalized accuracy
     source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=karakuri-ai/karakuri-lm-70b-chat-v0.1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 59.35
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=karakuri-ai/karakuri-lm-70b-chat-v0.1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TruthfulQA (0-shot)
+      type: truthful_qa
+      config: multiple_choice
+      split: validation
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: mc2
+      value: 51.39
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=karakuri-ai/karakuri-lm-70b-chat-v0.1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Winogrande (5-shot)
+      type: winogrande
+      config: winogrande_xl
+      split: validation
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 78.37
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=karakuri-ai/karakuri-lm-70b-chat-v0.1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 40.41
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=karakuri-ai/karakuri-lm-70b-chat-v0.1
+      name: Open LLM Leaderboard
 ---
 # KARAKURI LM
 If you plan to use KARAKURI LM for commercial purposes, please contact us beforehand. You are not authorized to use KARAKURI LM for commercial purposes unless we expressly grant you such rights.
 If you have any questions regarding the interpretation of above terms, please also feel free to contact us.
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_karakuri-ai__karakuri-lm-70b-chat-v0.1)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |62.36|
+|AI2 Reasoning Challenge (25-Shot)|61.52|
+|HellaSwag (10-Shot)              |83.13|
+|MMLU (5-Shot)                    |59.35|
+|TruthfulQA (0-shot)              |51.39|
+|Winogrande (5-shot)              |78.37|
+|GSM8k (5-shot)                   |40.41|