metadata
pipeline_tag: text-generation
tags:
- qwen
- qwen-2
- quantized
- 2-bit
- 3-bit
- 4-bit
- 5-bit
- 6-bit
- 8-bit
- 16-bit
- GGUF
inference: false
model_creator: MaziyarPanahi
model_name: Qwen2-72B-Instruct-v0.1-GGUF
quantized_by: MaziyarPanahi
license: other
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF
The GGUF and quantized models here are based on MaziyarPanahi/Qwen2-72B-Instruct-v0.1 model
How to download
You can download only the quants you need instead of cloning the entire repository as follows:
huggingface-cli download MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF --local-dir . --include '*Q2_K*gguf'
Load GGUF models
You MUST
follow the prompt template provided by Llama-3:
./llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q2_K.gguf -p "<|im_start|>user\nJust say 1, 2, 3 hi and NOTHING else\n<|im_end|>\n<|im_start|>assistant\n" -n 1024
Original README
MaziyarPanahi/Qwen2-72B-Instruct-v0.1
This is a fine-tuned version of the Qwen/Qwen2-72B-Instruct
model. It aims to improve the base model across all benchmarks.
⚡ Quantized GGUF
All GGUF models are available here: MaziyarPanahi/Qwen2-72B-Instruct-v0.1-GGUF
🏆 Open LLM Leaderboard Evaluation Results
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
truthfulqa_mc2 | 2 | none | 0 | acc | 0.6761 | ± | 0.0148 |
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
winogrande | 1 | none | 5 | acc | 0.8248 | ± | 0.0107 |
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_challenge | 1 | none | 25 | acc | 0.6852 | ± | 0.0136 |
none | 25 | acc_norm | 0.7184 | ± | 0.0131 |
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
gsm8k | 3 | strict-match | 5 | exact_match | 0.8582 | ± | 0.0096 |
flexible-extract | 5 | exact_match | 0.8893 | ± | 0.0086 |
Prompt Template
This model uses ChatML
prompt template:
<|im_start|>system
{System}
<|im_end|>
<|im_start|>user
{User}
<|im_end|>
<|im_start|>assistant
{Assistant}
How to use
# Use a pipeline as a high-level helper
from transformers import pipeline
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="MaziyarPanahi/Qwen2-72B-Instruct-v0.1")
pipe(messages)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Qwen2-72B-Instruct-v0.1")
model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Qwen2-72B-Instruct-v0.1")