Text Generation
Transformers
Safetensors
English
llama
causal-lm
text-generation-inference
4-bit precision
gptq

HF/bitsandbytes load_in_4bit is now apparently live apparently! (in peft)

#24
by 2themaxx - opened

Was looking at various llm quant options and came across: https://github.com/artidoro/qlora

It mentions load_in_4bit in the README.md, and I hadn't heard of that being available. Apparently they built a new datatype for it (not sure how performant it is). After a bit of looking around, it's apparently now part of the huggingface peft library. Tim Dettmers (working on bitsandbytes) is also a contributor to the qlora repo (likely also part of his research), but it looks like you can now load straight models into frozen 4-bit at least for training, and it should be able to be work in inference as well one would assume just by following the instructions on the readme.

I stumbled onto the closed conversation here by TheBloke mentioning "when it is released it will look like... load_in_4bit" or some such, and thought I'd post this in case it's useful to anyone.

Wonder if this could also be used on the CPU easily and how that might perform🤔

Thanks. It's not fully released yet. The code is in PEFT and transformers (or will be soon). But the actual 4bit bitsandbytes library is not yet released. I'm sure it'll be out very soon.

I wouldn't hold your hopes up for good performance. 8bit bitsandbytes performs much worse than other methods, and early indications seem to be that this may be the same for 4bit.

The huge benefit that bitsandbytes has is how easy it is to use. You can download any HF model and with one parameter, load it in 8bit instead, and soon 4bit as well. But that does bring with it a cost to performance. I wouldn't expect it to be usable on CPU at all; for that you want GGML q4 or q5.

Not claiming anything about performance, but it looks like it's at least in alpha release after the hugging face blog post on it today.

https://huggingface.co/blog/4bit-transformers-bitsandbytes

They claim it works in inference.

Yeah it's out for training - but not yet ready for inference https://twitter.com/Tim_Dettmers/status/1661617478865395712?s=20

image.png

Sign up or log in to comment