davidkim205
commited on
Commit
β’
ef36a03
1
Parent(s):
65f628e
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- ko
|
5 |
+
pipeline_tag: text-generation
|
6 |
+
|
7 |
+
---
|
8 |
+
# komt : korean multi task instruction tuning model
|
9 |
+
![multi task instruction tuning.jpg](https://github.com/davidkim205/komt/assets/16680469/c7f6ade7-247e-4b62-a94f-47e19abea68e)
|
10 |
+
|
11 |
+
Recently, due to the success of ChatGPT, numerous large language models have emerged in an attempt to catch up with ChatGPT's capabilities.
|
12 |
+
However, when it comes to Korean language performance, it has been observed that many models still struggle to provide accurate answers or generate Korean text effectively.
|
13 |
+
This study addresses these challenges by introducing a multi-task instruction technique that leverages supervised datasets from various tasks to create training data for Large Language Models (LLMs).
|
14 |
+
|
15 |
+
## Model Details
|
16 |
+
|
17 |
+
* **Model Developers** : davidkim(changyeon kim)
|
18 |
+
* **Repository** : https://github.com/davidkim205/komt
|
19 |
+
* **quant methods** : q4_0, q4_1, q5_0, q5_1, q2_k, q3_k, q3_k_m, q3_k_l, q4_k, q4_k_s, q4_k_m, q5_k, q5_k_s, q5_k_m, q8_0, q4_0
|
20 |
+
|
21 |
+
|
22 |
+
## Training
|
23 |
+
Refer https://github.com/davidkim205/komt
|
24 |
+
|
25 |
+
## usage
|
26 |
+
```
|
27 |
+
cd llama.cpp
|
28 |
+
make -j && ./main -m ./models/komt-mistral-7b-v1/ggml-model-q4_1.gguf -p "[INST]μΈμΌμ μ΄λ€ ν¨κ³Όκ° μλκ°μ? [/INST]"
|
29 |
+
```
|
30 |
+
output
|
31 |
+
```
|
32 |
+
[INST]μΈμΌμ μ΄λ€ ν¨κ³Όκ° μλκ°μ? [/INST]μΈμΌμλ μ½ 600κ°μ λ¬Όμ§μ΄ ν¬ν¨λμ΄ μμ΅λλ€. κ·Έ μ€μμλ λ§μ μ°κ΅¬λ€μ ν΅ν΄ ν¨λ₯μ νμΈν κ²μΌλ‘ μλ €μ§ 3κ°μ§λ μλμ κ°μ΅λλ€.
|
33 |
+
1. νμ μ‘°μ : κ°μ’
μ€νμμ μΈμΌμ΄ νμμ μ‘°μ νλλ° ν¨κ³Όλ₯Ό λνλμΌλ©°, νΉν μ€κ΅μ ν μ°κ΅¬μλ€μ μΈμΌμ μ¬μ©ν΄ 40%μ νμ κ°μλ₯Ό 보μμ΅λλ€.
|
34 |
+
2. μμ₯ κ°μ : μΈμΌμ νν°, ν΅μ¦ λ±μΌλ‘ κ³ ν΅λ°λ μμ₯ μ§νμ μΌλΆλλ§ κ°μ ν μ μλλ°, μ΄λ κ°μ’
μ€νλ€μμ νμΈλ κ²μ
λλ€.
|
35 |
+
3. λ©΄μ κ°ν: μΈμΌμ λ©΄μ체κ³λ₯Ό κ°νμν€λλ° ν¨κ³Όκ° μμΌλ©°, κ΅λ΄μμλ 2014λ
λΆν°λ μμ½μ²μ μμ½μ©ν μμΆμ¦λͺ
μ μ λν μ΅μ’
μ μΈ νκ°λ‘ μ¬μ©λκ³ μμ΅λλ€.
|
36 |
+
μμ κ°μ ν¨λ₯μ κ°μΆ μΈμΌμ λ§μ΄ μ¬μ©νλ 건κ°μνμ μλ£λ‘λ νμ©λ©λλ€. [end of text]
|
37 |
+
```
|
38 |
+
## Evaluation
|
39 |
+
For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .
|
40 |
+
|
41 |
+
|
42 |
+
|
43 |
+
| model | score | average(0~5) | percentage |
|
44 |
+
| --------------------------------------- |---------| ------------ | ---------- |
|
45 |
+
| gpt-3.5-turbo(close) | 147 | 3.97 | 79.45% |
|
46 |
+
| naver Cue(close) | 140 | 3.78 | 75.67% |
|
47 |
+
| clova X(close) | 136 | 3.67 | 73.51% |
|
48 |
+
| WizardLM-13B-V1.2(open) | 96 | 2.59 | 51.89% |
|
49 |
+
| Llama-2-7b-chat-hf(open) | 67 | 1.81 | 36.21% |
|
50 |
+
| Llama-2-13b-chat-hf(open) | 73 | 1.91 | 38.37% |
|
51 |
+
| nlpai-lab/kullm-polyglot-12.8b-v2(open) | 70 | 1.89 | 37.83% |
|
52 |
+
| kfkas/Llama-2-ko-7b-Chat(open) | 96 | 2.59 | 51.89% |
|
53 |
+
| beomi/KoAlpaca-Polyglot-12.8B(open) | 100 | 2.70 | 54.05% |
|
54 |
+
| **komt-llama2-7b-v1 (open)(ours)** | **117** | **3.16** | **63.24%** |
|
55 |
+
| **komt-llama2-13b-v1 (open)(ours)** | **129** | **3.48** | **69.72%** |
|
56 |
+
| **komt-llama-30b-v1 (open)(ours)** | **129** | **3.16** | **63.24%** |
|
57 |
+
| **komt-mistral-7b-v1 (open)(ours)** | **131** | **3.54** | **70.81%** |
|
58 |
+
|