chavinlo/alpaca-native · Can we get a 4bit quantized version?

Mar 19, 2023

It would help a lot if we can get a 4bit version, since all the 4bit versions out there are either based on the lora or are not working as expected

TaiyouIllusion

Mar 20, 2023

I quantized the model in google colab and tested it with alpaca.cpp. The quality is a bit improved compared to lora merged version.
I made a magnet link for the quantized version of this (file type is .bin). @chavinlo Can I share the link on github?

TaiyouIllusion

Mar 20, 2023

The format is ggml.

chavinlo

Owner Mar 20, 2023

I quantized the model in google colab and tested it with alpaca.cpp. The quality is a bit improved compared to lora merged version.
I made a magnet link for the quantized version of this (file type is .bin). @chavinlo Can I share the link on github?

sure or you can do it here and I can link it on the readme

TaiyouIllusion

Mar 21, 2023

Thanks! Here's the link(sorry if it was too long, I used online generator):

heartyhotdish

Mar 21, 2023

•

edited Mar 21, 2023

Thank you so much for this. I can confirm that the quantized native model from Taiyouillusion's magnet link is legit. Running on alpaca.cpp, it's a big leap forward in response quality compared to the 7B or 13B alpaca-lora models. What a time to be alive!

gorbypark

Mar 21, 2023

The format is ggml.

Can you share how you converted the post-trained HF weights back into the standard llama format, for conversion to ggml? Or did you go direct from HF to ggml somehow? I got hung up on a few things, one being that convert-pth-to-ggml.py (from llama.cpp) calls numpy().squeeze() on the data which does not support bfloat16, which alpaca uses. That was a quick fix (not sure if my hack would affect anything, but anyways), but quantize step then fails. From some sleuthing around, it seems like there needs to be a conversion step after the fine tuning to get the weights back into the standard llama format.

TaiyouIllusion

Mar 21, 2023

I uploaded the script I used in colab to convert the HF model on github: https://github.com/taiyou2000/alpaca-convert-colab/blob/main/alpaca-convert-colab-fixed.ipynb.

okram

Mar 21, 2023

I uploaded the script I used in colab to convert the HF model on github: https://github.com/taiyou2000/alpaca-convert-colab/blob/main/alpaca-convert-colab-fixed.ipynb.

when I try running this script, I get first an error about accelerate missing and after installing that, I get:

NameError                                 Traceback (most recent call last)

<ipython-input-4-bd7436545f55> in <module>
      8 tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf")
      9 
---> 10 base_model = LLaMAForCausalLM.from_pretrained(
     11     "chavinlo/alpaca-native",
     12     load_in_8bit=False,

/usr/local/lib/python3.9/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
   2488             init_contexts = [deepspeed.zero.Init(config_dict_or_path=deepspeed_config())] + init_contexts
   2489         elif load_in_8bit or low_cpu_mem_usage:
-> 2490             init_contexts.append(init_empty_weights())
   2491 
   2492         with ContextManagers(init_contexts):

NameError: name 'init_empty_weights' is not defined

Any hints on fixing this?

blackle

Mar 24, 2023

Because the upstream llama.cpp repository recently changed the quantized ggml format, any old q4.bin files will stop working, so I had to requantize this. I did manage to get it working. I had to remove the "accelerate" pip3 package, and use a colab runtime with a lot of ram. I was constantly almost running out of disk space while doing the conversion, I just managed to convert it.

here's the magnet link: magnet:?xt=urn:btih:0e51003c8a5610aa713f675891f0a7f87051be1a&dn=ggml-alpaca-7b-native-q4.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

sometimes I find that a magnet link won't work unless a few people have downloaded thru the actual torrent file. you can find it at "suricrasia dot online slash stuff slash ggml-alpaca-7b-native-q4 dot bin dot torrent dot txt" just replace "dot" with "." and "slash" with "/"

Black-Engineer

Mar 24, 2023

Because the upstream llama.cpp repository recently changed the quantized ggml format, any old q4.bin files will stop working, so I had to requantize this. I did manage to get it working. I had to remove the "accelerate" pip3 package, and use a colab runtime with a lot of ram. I was constantly almost running out of disk space while doing the conversion, I just managed to convert it.

here's the magnet link: magnet:?xt=urn:btih:0e51003c8a5610aa713f675891f0a7f87051be1a&dn=ggml-alpaca-7b-native-q4.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

sometimes I find that a magnet link won't work unless a few people have downloaded thru the actual torrent file. you can find it at "suricrasia dot online slash stuff slash ggml-alpaca-7b-native-q4 dot bin dot torrent dot txt" just replace "dot" with "." and "slash" with "/"

Can u post what you changed in google colab?

blackle

Mar 24, 2023

I actually didn't need to change anything, I just had to run with google colab pro. if you don't, it will ask you to install the "accelerate" package, and that's where the error comes from.