Vision Language Models (VLMs) quantized by Neural Magic
Neural Magic
company
Verified
AI & ML interests
LLMs, optimization, compression, sparsification, quantization, pruning, distillation, NLP, CV
Organization Card
The Future of AI is Open
Neural Magic helps developers in accelerating deep learning performance using automated model compression technologies and inference engines. Download our compression-aware inference engines and open source tools for fast model inference.
- nm-vllm: Enterprise-ready inferencing system based on the open-source library, vLLM, for at-scale operationalization of performant open-source LLMs
- LLM Compressor: HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM
- DeepSparse: Inference runtime offering accelerated performance on CPUs and APIs to integrate ML into your application
In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.
Collections
11
Accurate FP8 quantized models by Neural Magic, ready for use with vLLM!
-
neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8
Text Generation • Updated • 10.2k • 31 -
neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8
Text Generation • Updated • 108k • 34 -
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8
Text Generation • Updated • 53.8k • 32 -
neuralmagic/Phi-3-medium-128k-instruct-FP8
Text Generation • Updated • 1.54k • 5
spaces
8
Running
1
🔥
Quant Llms Text Generation
Quantized vs. Unquantized LLM: Text Generation Comparison
Sleeping
🏃
Llama 3 8B Chat Deepsparse
Sleeping
🏃
Llama 2 Sparse Transfer Chat Deepsparse
Runtime error
1
⚡
DeepSparse Sentiment Analysis
Runtime error
6
🏢
DeepSparse Named Entity Recognition
Sleeping
16
📚
Sparse Llama Gsm8k
models
246
neuralmagic/pixtral-12b-FP8-dynamic
Text Generation
•
Updated
•
4.23k
•
3
neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16
Text Generation
•
Updated
•
6.05k
•
9
neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16
Text Generation
•
Updated
•
10.7k
•
21
neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8
Text Generation
•
Updated
•
8.6k
•
12
neuralmagic/Meta-Llama-3.1-8B-quantized.w8a8
Text Generation
•
Updated
•
617
•
1
neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation
•
Updated
•
2.4k
•
5
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8-dynamic
Text Generation
•
Updated
•
1.65k
•
4
neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic
Text Generation
•
Updated
•
214
•
14
neuralmagic/Llama-3.1-Nemotron-70B-Instruct-HF-FP8-dynamic
Text Generation
•
Updated
•
22.6k
•
11
neuralmagic/Llama-3.2-3B-Instruct-FP8
Text Generation
•
Updated
•
13.4k
•
2
datasets
13
neuralmagic/Inference_performance_Llama_3.1_vllm0.6.1.post2
Updated
•
5
neuralmagic/mmlu_it
Viewer
•
Updated
•
14k
•
434
neuralmagic/mmlu_fr
Viewer
•
Updated
•
14k
•
412
neuralmagic/mmlu_th
Viewer
•
Updated
•
14k
•
447
neuralmagic/mmlu_de
Viewer
•
Updated
•
14k
•
434
neuralmagic/mmlu_es
Viewer
•
Updated
•
14k
•
433
neuralmagic/mmlu_hi
Viewer
•
Updated
•
14k
•
449
neuralmagic/mmlu_pt
Viewer
•
Updated
•
14k
•
451
neuralmagic/quantized-llama-3.1-leaderboard-v2-evals
Viewer
•
Updated
•
247k
•
869
neuralmagic/quantized-llama-3.1-humaneval-evals
Viewer
•
Updated
•
73.8k
•
133