How to convert 4bit model back to fp16 data format?

#52

by tremblingbrain - opened Mar 12

Mar 12

May I ask how to convert this 4bit model back to fp16/fp32 data format?
I tried to load it via from_pretrained(torch_dtype=torch.float16), then save_pretrained(). However the saved model is still in 4bit.
Can someone kindly land me a hand? Thanks!

YaTharThShaRma999

Mar 12

@tremblingbrain
Why do you want to convert the model back to fp16? It will be probably slightly even worse quality then this 4 bit one possibly

Use the original model if you want fp16 precision model since it’s going to be higher quality then then the 4 bit one.

Here is a unquantizaton script but I’m not sure if it works with gptq and bitsandbytes or just bitsandbytes.

Script

tremblingbrain

Mar 14

@YaTharThShaRma999 Thanks a lot for the conversion script.
I actually have some pre-developed code to do computation and analysis, but it only accepts fp16/fp32 models...
So I'm thinking about unquantizing this 4-bit model to fp16, and run some tests, basically comparing to the original fp16 model.

jlzhou

Mar 14

This is a quantized version of Llama-2-13b-chat. You can simply download the original model instead of this quantized version.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment