Expand Repositories section to add links to a couple more versions
I'm a bit confused about what this is for? You've added a GGUF link to a Wizard Falcon 40B GGUF? Why would that be linked from llama2 70B Chat Uncensored? Also the URL is invalid.
My GGUFs for this model, llama2 70b Chat Uncensored are next in the queue actually, so will appear quite soon.
I'll be getting around to Falcon GGUFs quite soon.
Sorry, it looks like I must have garbled the edit: I was doing GGUFs of two models that you hadn't got to, and it sounds like I must have added the wrong crosslink. Anyway, if you're going to do them soon, it's probably less confusing for people if everything comes through you.
Incidentally, is there any logistical issue to doing GGUFs of the new Falcon 180B, or is it just a matter of getting it done?
OK, understood, and thanks.
I did Llama 2 70B Chat Uncensored last night and I'm nearly done with all the Llama 2 models in GGUF. Then I'll start looking at Llama 1 and Falcon models.
There was one challenge with Falcon 180B, which was the convert script had to be updated. That was done by someone on the llama.cpp Github last night, and I'm making the GGUFs right now. Two are uploaded as I write this, and the rest will come soon: https://huggingface.co/TheBloke/Falcon-180B-Chat-GGUF/tree/main
Unfortunately every one of them is bigger than 50GB so they all have to be split for upload to HF, so there's a bit of manual work required by the user before they can be used (details will be in the README). But they work!
By the way, my GGUF quantization for WizardLM-Uncensored-Falcon-40b seems to be unsuccessful -- I've done it twice now, with identical results, and when I try to load it into koboldcpp, it fails with:
error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected 8192, 65024, got 8192, 65025, 1, 1
llama_load_model_from_file: failed to load model
gpttype_load_model: error: failed to load model '/Users/.../Documents/GitHub/koboldcpp/models/WizardLM-Uncensored-Falcon-40b-Q8_0.gguf'
Load Model OK: False
Could not load model: /Users/.../Documents/GitHub/koboldcpp/models/WizardLM-Uncensored-Falcon-40b-Q8_0.gguf
Even stranger, the same Python notebook running on the same hardware quantized the base Falcon 40B model and the result was successfully loaded by the same installation of koboldcpp, i.e. the problem seems to differ between different Falcon 40B models. So I'll download and test your GGUF version of WizardLM-Uncensored-Falcon-40b once it's done, and let you know if I have any problems with it.
Sadly my 64GB company Mac laptop doesn't have enough memory to run Falcon 180B locally even in Q2_K :-( And I asked them for a 128GB one when I joined...