Update README.md
Browse files
README.md
CHANGED
@@ -128,12 +128,13 @@ print(predict_score) # 1
|
|
128 |
```
|
129 |
|
130 |
### **Heatmap Visualize**
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
|
135 |
- prometheus-7b-v1.0 (english train-> english inference) # 3 failed to output a score, total 197
|
136 |
- llama3-8b-it-prometheus-ko (korean train-> korean inference) # total 200
|
|
|
137 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6152b4b9ecf3ca6ab820e325/ssZRGTysyiOZD4ttNOD4s.png)
|
138 |
|
139 |
### **Citation**
|
|
|
128 |
```
|
129 |
|
130 |
### **Heatmap Visualize**
|
131 |
+
[eng->eng] we randomly sampled 200 evalset from the [training data](https://huggingface.co/datasets/prometheus-eval/Feedback-Collection), extracted scores from the model-generated sentences, and compared them to the correct answers. The training and test datasets are not separated, so we can only see how well the model learned.
|
132 |
+
|
133 |
+
[ko->ko] sampled 200 evalset in this [testset](https://huggingface.co/datasets/nayohan/feedback-collection-ko-chat/viewer/default/test). llama3-8b-it-prometheus-ko only use train set.
|
134 |
|
135 |
- prometheus-7b-v1.0 (english train-> english inference) # 3 failed to output a score, total 197
|
136 |
- llama3-8b-it-prometheus-ko (korean train-> korean inference) # total 200
|
137 |
+
|
138 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6152b4b9ecf3ca6ab820e325/ssZRGTysyiOZD4ttNOD4s.png)
|
139 |
|
140 |
### **Citation**
|