Llama 3.1 70B Instruct Lorablated Creative Writer GGUF please
Hi,
Can you make a Q8 GGUF of this one: https://huggingface.co/NobodySpecial/Llama-3.1-70B-Instruct-Lorablated-Creative-Writer/tree/main
The main safetensors file is like ~140GB. I know Q8 quants of Llama 3.1 70b Lorablated run around 75GB. Could you make one that is 100GB, assuming bigger = better?
Thank you!
Hi, it's queued, and if all goes well, you get more or less the full list of quants to chose from. I recommend pikcing an imatrix one (the ...-i1-GGUF variant). Anyway, you can watch this model progress at http://hf.tst.eu/status.html
There is no quant between Q8_0 (about 70Gb) and f16 (more or less the original), but Q8_0 should be pretty much the same as the original.
Sounds great and thank you!!!
all seems well, and it looks like an interesting model. also interesting that nobody quanted that one before, btw. nice find
Many thanks and yeah I was surprised there were no GGUFs. I was looking for an EXL2 quant of llama 3.1 70B lorablated when the search results returned that model. I only have 136GB of VRAM and I think that adds up to ~140GB. If you like I can return back to this thread give you my thoughts on this model. Going to test it out now.
Thank you!
Edit: And for some reason the Lorablated/Abliterated models seem to follow instructions a tiny bit better. And they seem to have more enthusiasm and "moxy" (for lack of a better word) when I ask them to role play as a human being who is helping me do my work. In my experience, these models tend to take initiative in the work they are tasked with....until something happens and Oobabooga crashes (too long context? I don't know).
I'm using the Q8 quant and it's not that great. Seems like it lost a lot of intelligence. Instruction following abilities are not that great either.
That's not a dig at you—I personally download your quants over everyone else's. I think they're great.
I just think when they fine tuned this model it lost the ability to follow instructions and some intelligence as well.
Just 136GB? That's just 100 more than me, but get going :)
it lost the ability to follow instructions
That's unfortunately fairly common,. and especially bad for our sues it seems (I usually task the ai to be the narrator/writer so I am not limited to a single protagonist, and very few get that right. Most common problem is that ai writing about things that it shouldn't yet know about, .e. "x turns around the corner and sees a gun turret => x throws himself on the ground to avoid being shot, then turns around the corner and sees the the turret").
I would just be happy if AI for once would listen to me when I say: "Do not write complex sentences. Avoid writing dependent clauses." I have been at this for 2 years now and it's driving me insane. No amount of prompt engineering has helped.
Or if it does work...it only works for a few then the models go right back to their old habits. I'm experimenting with XTC in Ooba (and running EXL2 for the most part) but meh.
I'm just starting to learn how to fine tune. I hope that can solve my grammar woes.
If you do solve it wiht a finetune, I'll be glad to try it out :) But yeah, I experience that too, e.g. when all characters talk the same, regardless of age, education, character etc. It's always just the model talking eventually... So they do have character :)
Thanks. Using FireCrawl right now to scrape data from ~500 websites and will do a Lora fine tune in Oobabooga. Will hit you up if a miracle occurs and I manage to make something of value my first time.
Do you have any suggestions on which 13b (or 7b/8b?) model is best for following instructions? I know I can Google it....just wondering if you've heard anything in your circles?
I never seriously touch anything below 70B, to be honest. But even there, it's hit and miss. 95% miss, really. So maybe there are great 8 or 13b's for instruction following, but I wouldn't know :(
Same. But I don't have enough VRAM to fine tune a 70B model. So, I'm hoping I can make miracles happen with 13B. In fact, one of my 3090s went on the fritz today. So, I'm down to 112GB of VRAM.
48 GiB of GPU memory is enough to qlora finetune 70B at 4 bit precission so with your 112 GB you can easely qlora finetune 70B at 8 bit which in my opinion is more than precise enough and a far better option than to lora finetune 13B at 16 bit. I recommend against using Oobabooga for finetuning and instead recommend axolotl.