ssyok's picture
Update README.md
cd85391 verified
|
raw
history blame
No virus
2.17 kB
metadata
license: mit
pipeline_tag: text-generation
tags:
  - ONNX
  - DML
  - ONNXRuntime
  - phi3
  - nlp
  - conversational
  - custom_code
inference: false
language:
  - en

EmbeddedLLM/Phi-3-mini-4k-instruct-062024 ONNX

Model Summary

This model is an ONNX-optimized version of microsoft/Phi-3-mini-4k-instruct (June 2024), designed to provide accelerated inference on a variety of hardware using ONNX Runtime(CPU and DirectML). DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, providing GPU acceleration for a wide range of supported hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.

ONNX Models

Here are some of the optimized configurations we have added:

  • ONNX model for int4 DirectML: ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.

Hardware Requirements

Minimum Configuration:

  • Windows: DirectX 12-capable GPU (AMD/Nvidia)
  • CPU: x86_64 / ARM64 Tested Configurations:
  • GPU: AMD Ryzen 8000 Series iGPU (DirectML)
  • CPU: AMD Ryzen CPU

Model Description

  • Developed by: Microsoft
  • Model type: ONNX
  • Language(s) (NLP): Python, C, C++
  • License: Apache License Version 2.0
  • Model Description: This model is a conversion of the Phi-3-mini-4k-instruct-062024 for ONNX Runtime inference, optimized for DirectML.

Performance Metrics

DirectML

We measured the performance of DirectML on AMD Ryzen 9 7940HS /w Radeon 78

Prompt Length Generation Length Average Throughput (tps)
128 128 53.46686
128 256 53.11233
128 512 57.45816
128 1024 33.44713
256 128 76.50182
256 256 66.68873
256 512 70.83862
256 1024 34.64715
512 128 85.10079
512 256 68.64049
512 512 -
512 1024 -
1024 128 -
1024 256 -
1024 512 -
1024 1024 -