File size: 7,999 Bytes
4300638
 
6514a17
 
 
 
 
 
 
4300638
 
6514a17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
---
library_name: transformers
language:
- ko
base_model:
- meta-llama/Llama-3.1-8B
- NCSOFT/Llama-VARCO-8B-Instruct
- akjindal53244/Llama-3.1-Storm-8B
pipeline_tag: text-generation
---

# ๐Ÿค– LLM Evolutionary Merge

๐Ÿค— [Model](https://huggingface.co/fiveflow/LLMEvoLLaMA-3.1-8B-v0.1) | ๐Ÿ“‚ [Github](https://github.com/kwon13/LLM-Evo-Merge) | โœ๏ธ [Blog](์ž‘์„ฑ์ค‘..) | ๐Ÿ’ก[Inspired by Sakana AI](https://github.com/SakanaAI/evolutionary-model-merge)

![robot](./assets/robot.jpeg)
This project aims to optimize model merging by integrating LLMs into evolutionary strategies in a novel way. Instead of using the [CMA-ES](https://en.wikipedia.org/wiki/CMA-ES) approach, the goal is to improve model optimization by [leveraging the search capabilities of LLMs](https://arxiv.org/abs/2402.18381) to explore the parameter space more efficiently and adjust the search scope based on high-performing solutions.

Currently, the project supports optimization only within the Parameter Space, but I plan to extend its functionality to enable merging and optimization in the Data Flow Space as well. This will further enhance model merging by optimizing the interaction between data flow and parameters.

## Performance
I focused on creating a high-performing Korean model solely through merging, without additional model training.
<details>
<summary>Merging Recipe</summary>

```YAML
base_model: meta-llama/Llama-3.1-8B
dtype: bfloat16
merge_method: task_arithmetic
allow_negative_weights: true
parameters:
  int8_mask: 1.0
  normalize: 1.0
slices:
- sources:
  - layer_range: [0, 2]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 1
  - layer_range: [0, 2]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.3475802891062396
  - layer_range: [0, 2]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [2, 4]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.8971381657317269
  - layer_range: [2, 4]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.45369921781118544
  - layer_range: [2, 4]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [4, 6]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.5430828084884667
  - layer_range: [4, 6]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.2834723715836387
  - layer_range: [4, 6]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [6, 8]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.419043948030593
  - layer_range: [6, 8]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.3705268601566145
  - layer_range: [6, 8]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [8, 10]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.3813333860404775
  - layer_range: [8, 10]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.7634501436288518
  - layer_range: [8, 10]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [10, 12]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.49134830660275863
  - layer_range: [10, 12]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.7211994938499454
  - layer_range: [10, 12]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [12, 14]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.9218963071448836
  - layer_range: [12, 14]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.5117022419864319
  - layer_range: [12, 14]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [14, 16]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.8238938467581831
  - layer_range: [14, 16]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.851712316016478
  - layer_range: [14, 16]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [16, 18]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.3543028846914006
  - layer_range: [16, 18]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.6864368345788241
  - layer_range: [16, 18]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [18, 20]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.9189961100847883
  - layer_range: [18, 20]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.5800251781306379
  - layer_range: [18, 20]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [20, 22]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.9281691677008521
  - layer_range: [20, 22]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.5356892784211416
  - layer_range: [20, 22]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [22, 24]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.839268407952539
  - layer_range: [22, 24]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.5082186376599986
  - layer_range: [22, 24]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [24, 26]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.6241902192095534
  - layer_range: [24, 26]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.2945221540685877
  - layer_range: [24, 26]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [26, 28]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.7030728026501202
  - layer_range: [26, 28]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.2350478509634181
  - layer_range: [26, 28]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [28, 30]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 0.2590342230366074
  - layer_range: [28, 30]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.006083182855312869
  - layer_range: [28, 30]
    model: meta-llama/Llama-3.1-8B

- sources:
  - layer_range: [30, 32]
    model: NCSOFT/Llama-VARCO-8B-Instruct
    parameters:
      weight: 1
  - layer_range: [30, 32]
    model: akjindal53244/Llama-3.1-Storm-8B
    parameters:
      weight: 0.234650395825126
  - layer_range: [30, 32]
    model: meta-llama/Llama-3.1-8B
```
</details>

The models used for merging are listed below.
```
Base Model: meta-llama/Llama-3.1-8B
Model 1: NCSOFT/Llama-VARCO-8B-Instruct
Model 2: akjindal53244/Llama-3.1-Storm-8B
```
### Comparing LLMEvoLlama with Source in Korean Benchmark
![korean_performance](./assets/output.png)
- LogicKor: A benchmark that evaluates various linguistic abilities in Korean, including math, writing, coding, comprehension, grammar, and reasoning skills. (https://lk.instruct.kr/)

- KoBest: A benchmark consisting of five natural language understanding tasks designed to test advanced Korean language comprehension. (https://arxiv.org/abs/2204.04541)

### Comparing LLMEvoLlama with Source in English Benchmark and Total Average
| Model           | truthfulqa_mc2 (0-shot acc) | arc_challenge (0-shot acc) | Korean + English Performance (avg) |
|-----------------|-------------------------|------------------------|------------------------------|
| [VARCO](https://huggingface.co/NCSOFT/Llama-VARCO-8B-Instruct)           | 0.53                  | 0.47                 | 0.68                         |
| [Llama-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)  | 0.53                  | 0.52                 | 0.66                         |
| [Llama-Storm](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)     | 0.59                  | 0.52                 | 0.67                         |
| [LLMEvoLLaMA](https://huggingface.co/fiveflow/LLMEvoLLaMA-3.1-8B-v0.1)     | 0.57                  | 0.50                 | **0.71**                         |