ibm
/

PowerMoE-3b

Text Generation

Model card Files Files and versions Community

PowerMoE-3b / README.md

rpand002's picture

Update README.md

13fcb5a verified about 2 months ago

|

history blame contribute delete

3.85 kB

	---
	pipeline_tag: text-generation
	inference: false
	license: apache-2.0
	library_name: transformers
	model-index:
	- name: ibm/PowerMoE-3b
	results:
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: ARC
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 58.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: BoolQ
	metrics:
	- name: accuracy
	type: accuracy
	value: 65.0
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: Hellaswag
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 71.5
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: OpenBookQA
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 41.0
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: PIQA
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 79.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: Winogrande
	metrics:
	- name: accuracy-norm
	type: accuracy-norm
	value: 65.0
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: MMLU (5 shot)
	metrics:
	- name: accuracy
	type: accuracy
	value: 42.8
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: GSM8k (5 shot)
	metrics:
	- name: accuracy
	type: accuracy
	value: 25.9
	verified: false
	- task:
	type: text-generation
	dataset:
	type: lm-eval-harness
	name: math (4 shot)
	metrics:
	- name: accuracy
	type: accuracy
	value: 14.8
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode-eval
	name: humaneval
	metrics:
	- name: pass@1
	type: pass@1
	value: 20.1
	verified: false
	- task:
	type: text-generation
	dataset:
	type: bigcode-eval
	name: MBPP
	metrics:
	- name: pass@1
	type: pass@1
	value: 32.4
	verified: false
	---

	## Model Summary
	PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
	Paper: https://arxiv.org/abs/2408.13359

	## Usage
	Note: Requires installing HF transformers from source.

	### Generation
	This is a simple example of how to use PowerMoE-3b model.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	device = "cuda" # or "cpu"
	model_path = "ibm/PowerMoE-3b"
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	# drop device_map if running on CPU
	model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
	model.eval()
	# change input text as desired
	prompt = "Write a code to find the maximum value in a list of numbers."
	# tokenize the text
	input_tokens = tokenizer(prompt, return_tensors="pt")
	# transfer tokenized inputs to the device
	for i in input_tokens:
	input_tokens[i] = input_tokens[i].to(device)
	# generate output tokens
	output = model.generate(**input_tokens, max_new_tokens=100)
	# decode output tokens into text
	output = tokenizer.batch_decode(output)
	# loop over the batch to print, in this example the batch size is 1
	for i in output:
	print(i)
	```