Gemma-7B-Instruct-ONNX
Model Summary
This repository contains optimized versions of the gemma-7b-it model, designed to accelerate inference using ONNX Runtime. These optimizations are specifically tailored for CPU and DirectML. DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, offering GPU acceleration across a wide range of supported hardware and drivers, including those from AMD, Intel, NVIDIA, and Qualcomm.
ONNX Models
Here are some of the optimized configurations we have added:
- ONNX model for int4 DirectML: ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
- ONNX model for int4 CPU and Mobile: ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. Acc=1 is targeted at improved accuracy, while Acc=4 is for improved performance. For mobile devices, we recommend using the model with acc-level-4.
Usage
Installation and Setup
To use the Gemma-7B-Instruct-ONNX model on Windows with DirectML, follow these steps:
- Create and activate a Conda environment:
conda create -n onnx python=3.10
conda activate onnx
- Install Git LFS:
winget install -e --id GitHub.GitLFS
- Install Hugging Face CLI:
pip install huggingface-hub[cli]
- Download the model:
huggingface-cli download EmbeddedLLM/gemma-7b-it-onnx --include="onnx/directml/gemma-7b-it-int4/*" --local-dir .\gemma-7b-it-onnx
- Install necessary Python packages:
pip install numpy==1.26.4
pip install onnxruntime-directml
pip install --pre onnxruntime-genai-directml
- Install Visual Studio 2015 runtime:
conda install conda-forge::vs2015_runtime
- Download the example script:
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"
- Run the example script:
python phi3-qa.py -m .\gemma-7b-it-onnx
Hardware Requirements
Minimum Configuration:
- Windows: DirectX 12-capable GPU (AMD/Nvidia)
- CPU: x86_64 / ARM64
Tested Configurations:
- GPU: AMD Ryzen 8000 Series iGPU (DirectML)
- CPU: AMD Ryzen CPU
Model Page: Gemma
This model card corresponds to the 7B instruct version of the Gemma model. You can also visit the model card of the 2B base model, 7B base model, and 2B instruct model.
Resources and Technical Documentation:
Terms of Use: Terms