---
	license: llama3
	library_name: transformers
	tags:
	- nsfw
	- not-for-all-audiences
	- llama-3
	- text-generation-inference
	- moe
	- mergekit
	- merge
	model-index:
	- name: Llama-Salad-4x8B-V3
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 66.54
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HiroseKoichi/Llama-Salad-4x8B-V3
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 31.93
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HiroseKoichi/Llama-Salad-4x8B-V3
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 8.53
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HiroseKoichi/Llama-Salad-4x8B-V3
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 7.05
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HiroseKoichi/Llama-Salad-4x8B-V3
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 6.45
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HiroseKoichi/Llama-Salad-4x8B-V3
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 27.98
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HiroseKoichi/Llama-Salad-4x8B-V3
	name: Open LLM Leaderboard
	---

	# Llama-Salad-4x8B-V3
	Changes in V3:
	- Uses `L3-8B-Stheno-v3.2` as the base model instead of `Meta-Llama-3-8B-Instruct`
	- Removed `opus-v1.2-llama-3-8b-instruct-run3.5-epoch2.5` and added `Einstein-v6.1-Llama3-8B`
	- Swapped `Llama-3-Soliloquy-8B-v2` for `L3-8B-Stheno-v3.2`

	I was clearly wrong when I said V2 would be difficult to improve on, because V3 is significantly better in just about every aspect. Stheno-v3.2 fixed all of the issues present in Stheno-v3.1, making it my favorite roleplay model and the best base model for llama-3 MoE merges.

	The one thing I do want to improve on is finding a better conversational model than Meta-Llama-3-8B-Instruct; it's good for that use case, but I'm sure there's a better one out there. I tried using llama-3-cat-8b-instruct-v1, but it absolutely tanked the model's situational awareness and kept making blatantly contradictory statements.

	# Quantization Formats
	GGUF
	- Static:
	- https://huggingface.co/mradermacher/Llama-Salad-4x8B-V3-GGUF
	- Imatrix:
	- https://huggingface.co/mradermacher/Llama-Salad-4x8B-V3-i1-GGUF

	# Details
	- License: [llama3](https://llama.meta.com/llama3/license/)
	- Instruct Format: [llama-3](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/)
	- Context Size: 8K

	## Models Used
	- [L3-8B-Stheno-v3.2](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2)
	- [Meta-Llama-3-8B-Instruct](https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct)
	- [Llama-3-8B-Synthia-v3.5](https://huggingface.co/migtissera/Llama-3-8B-Synthia-v3.5)
	- [Einstein-v6.1-Llama3-8B](https://huggingface.co/Weyaxi/Einstein-v6.1-Llama3-8B)

	## Merge Config
	```yaml
	base_model: Sao10K/L3-8B-Stheno-v3.2
	gate_mode: hidden
	dtype: bfloat16
	experts_per_token: 2
	experts:
	- source_model: NousResearch/Meta-Llama-3-8B-Instruct
	positive_prompts:
	- "chat"
	- "conversation"
	- source_model: Weyaxi/Einstein-v6.1-Llama3-8B
	positive_prompts:
	- "science"
	- "physics"
	- "chemistry"
	- "biology"
	- "math"
	- "step-by-step"
	- "logical reasoning"
	- "multilingual"
	- "translation"
	- "language translation"
	- "foreign language"
	negative_prompts:
	- "programming language"
	- source_model: migtissera/Llama-3-8B-Synthia-v3.5
	positive_prompts:
	- "summarize"
	- "paraphrase"
	- "list"
	- "explain"
	- "define"
	- "analyze"
	- "rephrase"
	- "elaborate"
	- "programming language"
	- "JavaScript"
	- "Python programming language"
	- "Rust programming language"
	- "C++ programming language"
	- "GO programming language"
	- "Ruby programming language"
	- "Haskell programming language"
	- "SQL query language"
	- "CSS markup styling language"
	- "code"
	- source_model: Sao10K/L3-8B-Stheno-v3.2
	positive_prompts:
	- "characters"
	- "scene"
	- "roleplay"
	- "erotic roleplay"
	- "sexual fetish"
	- "NSFW"
	- "creative writing"
	- "storytelling"
	- "narration"
	- "narrative setting"
	- "narrative plot"
	- "narrative exposition"
	- "narrative theme"
	- "narrative climax"
	```
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HiroseKoichi__Llama-Salad-4x8B-V3)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \|24.75\|
	\|IFEval (0-Shot) \|66.54\|
	\|BBH (3-Shot) \|31.93\|
	\|MATH Lvl 5 (4-Shot)\| 8.53\|
	\|GPQA (0-shot) \| 7.05\|
	\|MuSR (0-shot) \| 6.45\|
	\|MMLU-PRO (5-shot) \|27.98\|