|
--- |
|
license: apache-2.0 |
|
language: |
|
- fr |
|
- it |
|
- de |
|
- es |
|
- en |
|
tags: |
|
- moe |
|
- mixtral |
|
- sharegpt |
|
- axolotl |
|
library_name: transformers |
|
base_model: v2ray/Mixtral-8x22B-v0.2 |
|
inference: false |
|
model_creator: MaziyarPanahi |
|
model_name: Goku-8x22B-v0.2 |
|
pipeline_tag: text-generation |
|
quantized_by: MaziyarPanahi |
|
datasets: |
|
- microsoft/orca-math-word-problems-200k |
|
- teknium/OpenHermes-2.5 |
|
--- |
|
|
|
<img src="./Goku-8x22b-v0.1.webp" alt="Goku 8x22B v0.1 Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/> |
|
|
|
# Goku-8x22B-v0.2 (Goku 141b-A35b) |
|
|
|
A fine-tuned version of [v2ray/Mixtral-8x22B-v0.2](https://huggingface.co/v2ray/Mixtral-8x22B-v0.2) model on the following datasets: |
|
|
|
- teknium/OpenHermes-2.5 |
|
- WizardLM/WizardLM_evol_instruct_V2_196k |
|
- microsoft/orca-math-word-problems-200k |
|
|
|
This model has a total of 141b parameters with 35b only active. The major difference in this version is that the model was trained on more datasets and with an `8192 sequence length`. This results in the model being able to generate longer and more coherent responses. |
|
|
|
|
|
## How to use it |
|
|
|
|
|
**Use a pipeline as a high-level helper:** |
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("text-generation", model="MaziyarPanahi/Goku-8x22B-v0.2") |
|
``` |
|
|
|
**Load model directly:** |
|
```python |
|
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2") |
|
model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2") |
|
``` |
|
|