Request for Instructions

#1
by Ailelix - opened

First of all, thanks for your impeccable work.
Currently Ryzen AI platform is quite experimental and limited. I'm glad to have a llm to play on my NPU.

However, llama3.1-8b seems to be too heavy for a NPU I'm using HX370
I think that a lighter model (e.g. llama3.2-1b/gemma2-2b etc) would be more suitable for NPU to run
So I tried to quantitize Llama-3.2-1b on my own
However, I failed to translate it to NPU platform with Vivits AI

So I wonder what you did to translate the model in this project
I'd appreciate it if you can reply.

Hello.

I was not able to convert with high accuracy using Vivits AI either.

Here is an overview.
https://www.hackster.io/gharada2013/running-llm-on-amd-npu-hardware-19322f

Yeah AMD has not developed a rich soft environment...
I may stop now to wait for either llama.cpp support in LM-Studio, or an ollama support

Anyways, thanks again for your contribution

I tried to awq convert and run 1B and 3B. I was able to convert the model, but unfortunately the output of both was broken.
The model is small enough that it might be better to consider running it on the CPU rather than on the NPU.

Yep I just wanted to try running llm on my NPU without AC Power, to see if a lightweight local AI copilot is possible in daily productivity
BTW I just certificated Github Education to use Copilot so... : )
I still think that using NPU can be more power efficiency tho

Anyways, I'm currently using your llama3.1-8b-instruct-npu
Thx for the contribution

Ailelix changed discussion status to closed

Sign up or log in to comment