MaziyarPanahi
/

Goku-8x22B-v0.2

Text Generation

Mixture of Experts

text-generation-inference

Model card Files Files and versions Community

Goku-8x22B-v0.2 / README.md

MaziyarPanahi's picture

Create README.md (#1)

32dec96 verified 7 months ago

|

1.51 kB

	---
	license: apache-2.0
	language:
	- fr
	- it
	- de
	- es
	- en
	tags:
	- moe
	- mixtral
	- sharegpt
	- axolotl
	library_name: transformers
	base_model: v2ray/Mixtral-8x22B-v0.2
	inference: false
	model_creator: MaziyarPanahi
	model_name: Goku-8x22B-v0.2
	pipeline_tag: text-generation
	quantized_by: MaziyarPanahi
	datasets:
	- microsoft/orca-math-word-problems-200k
	- teknium/OpenHermes-2.5
	---

	<img src="./Goku-8x22b-v0.1.webp" alt="Goku 8x22B v0.1 Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>

	# Goku-8x22B-v0.2 (Goku 141b-A35b)

	A fine-tuned version of [v2ray/Mixtral-8x22B-v0.2](https://huggingface.co/v2ray/Mixtral-8x22B-v0.2) model on the following datasets:

	- teknium/OpenHermes-2.5
	- WizardLM/WizardLM_evol_instruct_V2_196k
	- microsoft/orca-math-word-problems-200k

	This model has a total of 141b parameters with 35b only active. The major difference in this version is that the model was trained on more datasets and with an `8192 sequence length`. This results in the model being able to generate longer and more coherent responses.


	## How to use it


	Use a pipeline as a high-level helper:
	```python
	from transformers import pipeline

	pipe = pipeline("text-generation", model="MaziyarPanahi/Goku-8x22B-v0.2")
	```

	Load model directly:
	```python

	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2")
	model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2")
	```