nsfw

Not-For-All-Audiences

llama-3

text-generation-inference

Mixture of Experts

Model card Files Files and versions Community

Llama-Salad-8x8B

File size: 5,703 Bytes

---
license: llama3
library_name: transformers
tags:
- nsfw
- not-for-all-audiences
- llama-3
- text-generation-inference
- moe
- mergekit
- merge
---

# Llama-Salad-8x8B
This MoE merge is meant to compete with Mixtral fine-tunes, more specifically [Nous-Hermes-2-Mixtral-8x7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO), which I think is the best of them. I've done a bunch of side-by-side comparisons, and while I can't say it wins in every aspect, it's very close. Some of its shortcomings are multilingualism, storytelling, and roleplay, despite using models that are very good at those tasks.

It won't respond in the language you prompt it with unless the model has already spoken that language, despite Suzume being designed to do just that. The model writes really well because of Soliloquy and Opus, but it doesn't quite understand the difference between roleplay and storytelling; it treats just about everything like a story and will over-respond to everything you do. If you want a good experience, you will either have to explain what roleplay is or show it by example, but it is very good if you do.

I have narrowed down the reason behind these shortcomings to one thing: self-attention. The base model is actually the most important part of a MoE merge; you can think of it as taking that base model and improving it rather than merging all of the models' capabilities. If that base model has a specific writing style, behavior, or lack of knowledge for a specific task, then it will carry over into the MoE merge, regardless of the quality of the weights used.

Likewise, I have found that censorship does not come from the model's weights but rather the self-attention; if you take the self-attention from an uncensored model and combine it with the weights from a censored model, then the resulting model will be uncensored. The self-attention decides what the model should be doing and how to do it, and the weights predict tokens according to its specifications.

I have tried using over a dozen different models as the base, and Synthia is by far the best. Aside from swapping in better models, the only way that I can see to improve from here is to merge Synthia with other models in order to reduce these shortcomings, which I will definitely be doing in the future.

# Quantization Formats
**GGUF**
- Static:
    - https://huggingface.co/bartowski/Llama-Salad-8x8B-GGUF
    - https://huggingface.co/mradermacher/Llama-Salad-8x8B-GGUF
- Imatrix
    - https://huggingface.co/mradermacher/Llama-Salad-8x8B-i1-GGUF

# Details
- **License**: [llama3](https://llama.meta.com/llama3/license/)
- **Instruct Format**: [llama-3](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/)
- **Context Size**: 8K

## Models Used
- [Meta-Llama-3-8B-Instruct](https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct)
- [Llama-3-8B-Synthia-v3.5](https://huggingface.co/migtissera/Llama-3-8B-Synthia-v3.5)
- [Llama-3-Soliloquy-8B-v2](https://huggingface.co/openlynn/Llama-3-Soliloquy-8B-v2)
- [opus-v1.2-llama-3-8b-instruct-run3.5-epoch2.5](https://huggingface.co/dreamgen-preview/opus-v1.2-llama-3-8b-instruct-run3.5-epoch2.5)
- [Einstein-v6.1-Llama3-8B](https://huggingface.co/Weyaxi/Einstein-v6.1-Llama3-8B)
- [suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual)
- [Llama-3-8B-UltraMedical](https://huggingface.co/TsinghuaC3I/Llama-3-8B-UltraMedical)
- [Llama-3-8B-Instruct-Coder](https://huggingface.co/rombodawg/Llama-3-8B-Instruct-Coder)

## Merge Config
```yaml
base_model: migtissera/Llama-3-8B-Synthia-v3.5
gate_mode: hidden
dtype: bfloat16
experts_per_token: 2
experts:
  - source_model: rombodawg/Llama-3-8B-Instruct-Coder
    positive_prompts:
    - "programming language"
    - "JavaScript"
    - "Python programming language"
    - "Rust programming language"
    - "C++ programming language"
    - "GO programming language"
    - "Ruby programming language"
    - "Haskell programming language"
    - "SQL query language"
    - "CSS markup styling language"
    - "code"
  - source_model: openlynn/Llama-3-Soliloquy-8B-v2
    positive_prompts:
    - "characters"
    - "scene"
    - "roleplay"
    - "erotic roleplay"
    - "sexual fetish"
    - "NSFW"
    negative_prompts:
    - "biology"
  - source_model: dreamgen-preview/opus-v1.2-llama-3-8b-instruct-run3.5-epoch2.5
    positive_prompts:
    - "creative writing"
    - "storytelling"
    - "narration"
    - "narrative setting"
    - "narrative plot"
    - "narrative exposition"
    - "narrative theme"
    - "narrative climax"
  - source_model: Weyaxi/Einstein-v6.1-Llama3-8B
    positive_prompts:
    - "science"
    - "physics"
    - "chemistry"
    - "biology"
    - "math"
    - "step-by-step"
    - "logical reasoning"
    negative_prompts:
    - "programming language"
  - source_model: migtissera/Llama-3-8B-Synthia-v3.5
    positive_prompts:
    - "summarize"
    - "paraphrase"
    - "list"
    - "explain"
    - "define"
    - "analyze"
    - "rephrase"
    - "elaborate"
  - source_model: lightblue/suzume-llama-3-8B-multilingual
    positive_prompts:
    - "multilingual"
    - "language translation"
    - "日本語"
    - "汉语"
    - "Deutsch"
    - "Français"
    - "русский язык"
    negative_prompts:
    - "programming language"
    - "English"
  - source_model: TsinghuaC3I/Llama-3-8B-UltraMedical
    positive_prompts:
    - "anatomy"
    - "medical diagnosis"
    - "symptom"
    - "healthcare"
    - "medicine"
    - "medication"
    negative_prompts:
    - "sexual fetish"
  - source_model: NousResearch/Meta-Llama-3-8B-Instruct
    positive_prompts:
    - "chat"
    - "conversation"
```