File size: 5,703 Bytes
4ad4542
 
 
 
 
 
 
 
 
 
 
 
 
 
3485890
 
 
 
 
 
 
 
 
4ad4542
132adbe
 
 
 
b7f42bd
 
 
132adbe
4ad4542
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
license: llama3
library_name: transformers
tags:
- nsfw
- not-for-all-audiences
- llama-3
- text-generation-inference
- moe
- mergekit
- merge
---

# Llama-Salad-8x8B
This MoE merge is meant to compete with Mixtral fine-tunes, more specifically [Nous-Hermes-2-Mixtral-8x7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO), which I think is the best of them. I've done a bunch of side-by-side comparisons, and while I can't say it wins in every aspect, it's very close. Some of its shortcomings are multilingualism, storytelling, and roleplay, despite using models that are very good at those tasks.

It won't respond in the language you prompt it with unless the model has already spoken that language, despite Suzume being designed to do just that. The model writes really well because of Soliloquy and Opus, but it doesn't quite understand the difference between roleplay and storytelling; it treats just about everything like a story and will over-respond to everything you do. If you want a good experience, you will either have to explain what roleplay is or show it by example, but it is very good if you do.

I have narrowed down the reason behind these shortcomings to one thing: self-attention. The base model is actually the most important part of a MoE merge; you can think of it as taking that base model and improving it rather than merging all of the models' capabilities. If that base model has a specific writing style, behavior, or lack of knowledge for a specific task, then it will carry over into the MoE merge, regardless of the quality of the weights used.

Likewise, I have found that censorship does not come from the model's weights but rather the self-attention; if you take the self-attention from an uncensored model and combine it with the weights from a censored model, then the resulting model will be uncensored. The self-attention decides what the model should be doing and how to do it, and the weights predict tokens according to its specifications.

I have tried using over a dozen different models as the base, and Synthia is by far the best. Aside from swapping in better models, the only way that I can see to improve from here is to merge Synthia with other models in order to reduce these shortcomings, which I will definitely be doing in the future.

# Quantization Formats
**GGUF**
- Static:
    - https://huggingface.co/bartowski/Llama-Salad-8x8B-GGUF
    - https://huggingface.co/mradermacher/Llama-Salad-8x8B-GGUF
- Imatrix
    - https://huggingface.co/mradermacher/Llama-Salad-8x8B-i1-GGUF

# Details
- **License**: [llama3](https://llama.meta.com/llama3/license/)
- **Instruct Format**: [llama-3](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/)
- **Context Size**: 8K

## Models Used
- [Meta-Llama-3-8B-Instruct](https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct)
- [Llama-3-8B-Synthia-v3.5](https://huggingface.co/migtissera/Llama-3-8B-Synthia-v3.5)
- [Llama-3-Soliloquy-8B-v2](https://huggingface.co/openlynn/Llama-3-Soliloquy-8B-v2)
- [opus-v1.2-llama-3-8b-instruct-run3.5-epoch2.5](https://huggingface.co/dreamgen-preview/opus-v1.2-llama-3-8b-instruct-run3.5-epoch2.5)
- [Einstein-v6.1-Llama3-8B](https://huggingface.co/Weyaxi/Einstein-v6.1-Llama3-8B)
- [suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual)
- [Llama-3-8B-UltraMedical](https://huggingface.co/TsinghuaC3I/Llama-3-8B-UltraMedical)
- [Llama-3-8B-Instruct-Coder](https://huggingface.co/rombodawg/Llama-3-8B-Instruct-Coder)

## Merge Config
```yaml
base_model: migtissera/Llama-3-8B-Synthia-v3.5
gate_mode: hidden
dtype: bfloat16
experts_per_token: 2
experts:
  - source_model: rombodawg/Llama-3-8B-Instruct-Coder
    positive_prompts:
    - "programming language"
    - "JavaScript"
    - "Python programming language"
    - "Rust programming language"
    - "C++ programming language"
    - "GO programming language"
    - "Ruby programming language"
    - "Haskell programming language"
    - "SQL query language"
    - "CSS markup styling language"
    - "code"
  - source_model: openlynn/Llama-3-Soliloquy-8B-v2
    positive_prompts:
    - "characters"
    - "scene"
    - "roleplay"
    - "erotic roleplay"
    - "sexual fetish"
    - "NSFW"
    negative_prompts:
    - "biology"
  - source_model: dreamgen-preview/opus-v1.2-llama-3-8b-instruct-run3.5-epoch2.5
    positive_prompts:
    - "creative writing"
    - "storytelling"
    - "narration"
    - "narrative setting"
    - "narrative plot"
    - "narrative exposition"
    - "narrative theme"
    - "narrative climax"
  - source_model: Weyaxi/Einstein-v6.1-Llama3-8B
    positive_prompts:
    - "science"
    - "physics"
    - "chemistry"
    - "biology"
    - "math"
    - "step-by-step"
    - "logical reasoning"
    negative_prompts:
    - "programming language"
  - source_model: migtissera/Llama-3-8B-Synthia-v3.5
    positive_prompts:
    - "summarize"
    - "paraphrase"
    - "list"
    - "explain"
    - "define"
    - "analyze"
    - "rephrase"
    - "elaborate"
  - source_model: lightblue/suzume-llama-3-8B-multilingual
    positive_prompts:
    - "multilingual"
    - "language translation"
    - "日本語"
    - "汉语"
    - "Deutsch"
    - "Français"
    - "русский язык"
    negative_prompts:
    - "programming language"
    - "English"
  - source_model: TsinghuaC3I/Llama-3-8B-UltraMedical
    positive_prompts:
    - "anatomy"
    - "medical diagnosis"
    - "symptom"
    - "healthcare"
    - "medicine"
    - "medication"
    negative_prompts:
    - "sexual fetish"
  - source_model: NousResearch/Meta-Llama-3-8B-Instruct
    positive_prompts:
    - "chat"
    - "conversation"
```