MaziyarPanahi
/

Goku-8x22B-v0.2

Text Generation

Mixture of Experts

text-generation-inference

Model card Files Files and versions Community

MaziyarPanahi commited on Apr 16

Commit

32dec96

•

1 Parent(s): 150f286

Create README.md (#1)

- Create README.md (18960db34d92bea19077bd656a32774afeaaa9aa)

Files changed (1) hide show

README.md +56 -0

README.md ADDED Viewed

	@@ -0,0 +1,56 @@

+---
+license: apache-2.0
+language:
+- fr
+- it
+- de
+- es
+- en
+tags:
+- moe
+- mixtral
+- sharegpt
+- axolotl
+library_name: transformers
+base_model: v2ray/Mixtral-8x22B-v0.2
+inference: false
+model_creator: MaziyarPanahi
+model_name: Goku-8x22B-v0.2
+pipeline_tag: text-generation
+quantized_by: MaziyarPanahi
+datasets:
+- microsoft/orca-math-word-problems-200k
+- teknium/OpenHermes-2.5
+---
+<img src="./Goku-8x22b-v0.1.webp" alt="Goku 8x22B v0.1 Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+# Goku-8x22B-v0.2 (Goku 141b-A35b)
+A fine-tuned version of [v2ray/Mixtral-8x22B-v0.2](https://huggingface.co/v2ray/Mixtral-8x22B-v0.2) model on the following datasets:
+- teknium/OpenHermes-2.5
+- WizardLM/WizardLM_evol_instruct_V2_196k
+- microsoft/orca-math-word-problems-200k
+This model has a total of 141b parameters with 35b only active. The major difference in this version is that the model was trained on more datasets and with an `8192 sequence length`. This results in the model being able to generate longer and more coherent responses.
+## How to use it
+**Use a pipeline as a high-level helper:**
+```python
+from transformers import pipeline
+pipe = pipeline("text-generation", model="MaziyarPanahi/Goku-8x22B-v0.2")
+```
+**Load model directly:**
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2")
+model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2")
+```