Can we get a 4bit quantized version?
It would help a lot if we can get a 4bit version, since all the 4bit versions out there are either based on the lora or are not working as expected
I quantized the model in google colab and tested it with alpaca.cpp. The quality is a bit improved compared to lora merged version.
I made a magnet link for the quantized version of this (file type is .bin).
@chavinlo
Can I share the link on github?
The format is ggml.
I quantized the model in google colab and tested it with alpaca.cpp. The quality is a bit improved compared to lora merged version.
I made a magnet link for the quantized version of this (file type is .bin). @chavinlo Can I share the link on github?
sure or you can do it here and I can link it on the readme
Thanks! Here's the link(sorry if it was too long, I used online generator):
magnet:?xt=urn:btih:69fb9b4c1e0888336f5253ae75d3e10a9299ab7d&dn=ggml-alpaca-7b-native-q4.bin&tr=http%3A%2F%2F125.227.35.196%3A6969%2Fannounce&tr=http%3A%2F%2F210.244.71.25%3A6969%2Fannounce&tr=http%3A%2F%2F210.244.71.26%3A6969%2Fannounce&tr=http%3A%2F%2F213.159.215.198%3A6970%2Fannounce&tr=http%3A%2F%2F37.19.5.139%3A6969%2Fannounce&tr=http%3A%2F%2F37.19.5.155%3A6881%2Fannounce&tr=http%3A%2F%2F46.4.109.148%3A6969%2Fannounce&tr=http%3A%2F%2F87.248.186.252%3A8080%2Fannounce&tr=http%3A%2F%2Fasmlocator.ru%3A34000%2F1hfZS1k4jh%2Fannounce&tr=http%3A%2F%2Fbt.evrl.to%2Fannounce&tr=http%3A%2F%2Fbt.rutracker.org%2Fann&tr=https%3A%2F%2Fwww.artikelplanet.nl&tr=http%3A%2F%2Fmgtracker.org%3A6969%2Fannounce&tr=http%3A%2F%2Fpubt.net%3A2710%2Fannounce&tr=http%3A%2F%2Ftracker.baravik.org%3A6970%2Fannounce&tr=http%3A%2F%2Ftracker.dler.org%3A6969%2Fannounce&tr=http%3A%2F%2Ftracker.filetracker.pl%3A8089%2Fannounce&tr=http%3A%2F%2Ftracker.grepler.com%3A6969%2Fannounce&tr=http%3A%2F%2Ftracker.mg64.net%3A6881%2Fannounce&tr=http%3A%2F%2Ftracker.tiny-vps.com%3A6969%2Fannounce&tr=http%3A%2F%2Ftracker.torrentyorg.pl%2Fannounce&tr=https%3A%2F%2Finternet.sitelio.me%2F&tr=https%3A%2F%2Fcomputer1.sitelio.me%2F&tr=udp%3A%2F%2F168.235.67.63%3A6969&tr=udp%3A%2F%2F182.176.139.129%3A6969&tr=udp%3A%2F%2F37.19.5.155%3A2710&tr=udp%3A%2F%2F46.148.18.250%3A2710&tr=udp%3A%2F%2F46.4.109.148%3A6969&tr=udp%3A%2F%2Fcomputerbedrijven.bestelinks.nl%2F&tr=udp%3A%2F%2Fcomputerbedrijven.startsuper.nl%2F&tr=udp%3A%2F%2Fcomputershop.goedbegin.nl%2F&tr=udp%3A%2F%2Fc3t.org&tr=udp%3A%2F%2Fallerhandelenlaag.nl&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.tiny-vps.com%3A6969
Thank you so much for this. I can confirm that the quantized native model from Taiyouillusion's magnet link is legit. Running on alpaca.cpp, it's a big leap forward in response quality compared to the 7B or 13B alpaca-lora models. What a time to be alive!
The format is ggml.
Can you share how you converted the post-trained HF weights back into the standard llama format, for conversion to ggml? Or did you go direct from HF to ggml somehow? I got hung up on a few things, one being that convert-pth-to-ggml.py (from llama.cpp) calls numpy().squeeze() on the data which does not support bfloat16, which alpaca uses. That was a quick fix (not sure if my hack would affect anything, but anyways), but quantize step then fails. From some sleuthing around, it seems like there needs to be a conversion step after the fine tuning to get the weights back into the standard llama format.
I uploaded the script I used in colab to convert the HF model on github: https://github.com/taiyou2000/alpaca-convert-colab/blob/main/alpaca-convert-colab-fixed.ipynb.
I uploaded the script I used in colab to convert the HF model on github: https://github.com/taiyou2000/alpaca-convert-colab/blob/main/alpaca-convert-colab-fixed.ipynb.
when I try running this script, I get first an error about accelerate missing and after installing that, I get:
NameError Traceback (most recent call last)
<ipython-input-4-bd7436545f55> in <module>
8 tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf")
9
---> 10 base_model = LLaMAForCausalLM.from_pretrained(
11 "chavinlo/alpaca-native",
12 load_in_8bit=False,
/usr/local/lib/python3.9/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2488 init_contexts = [deepspeed.zero.Init(config_dict_or_path=deepspeed_config())] + init_contexts
2489 elif load_in_8bit or low_cpu_mem_usage:
-> 2490 init_contexts.append(init_empty_weights())
2491
2492 with ContextManagers(init_contexts):
NameError: name 'init_empty_weights' is not defined
Any hints on fixing this?
Because the upstream llama.cpp repository recently changed the quantized ggml format, any old q4.bin files will stop working, so I had to requantize this. I did manage to get it working. I had to remove the "accelerate" pip3 package, and use a colab runtime with a lot of ram. I was constantly almost running out of disk space while doing the conversion, I just managed to convert it.
here's the magnet link: magnet:?xt=urn:btih:0e51003c8a5610aa713f675891f0a7f87051be1a&dn=ggml-alpaca-7b-native-q4.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
sometimes I find that a magnet link won't work unless a few people have downloaded thru the actual torrent file. you can find it at "suricrasia dot online slash stuff slash ggml-alpaca-7b-native-q4 dot bin dot torrent dot txt" just replace "dot" with "." and "slash" with "/"
Because the upstream llama.cpp repository recently changed the quantized ggml format, any old q4.bin files will stop working, so I had to requantize this. I did manage to get it working. I had to remove the "accelerate" pip3 package, and use a colab runtime with a lot of ram. I was constantly almost running out of disk space while doing the conversion, I just managed to convert it.
here's the magnet link: magnet:?xt=urn:btih:0e51003c8a5610aa713f675891f0a7f87051be1a&dn=ggml-alpaca-7b-native-q4.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
sometimes I find that a magnet link won't work unless a few people have downloaded thru the actual torrent file. you can find it at "suricrasia dot online slash stuff slash ggml-alpaca-7b-native-q4 dot bin dot torrent dot txt" just replace "dot" with "." and "slash" with "/"
Can u post what you changed in google colab?
I actually didn't need to change anything, I just had to run with google colab pro. if you don't, it will ask you to install the "accelerate" package, and that's where the error comes from.