Problem Model
It's not broken for me, you need the last llama.cpp.
What he said ^
And if you're still having issues, make sure flash attention is off
What he said ^
And if you're still having issues, make sure flash attention is off
Lm Studio updated the ccp yesterday and the problem remains even without flah
Same. same. broken ggufs
I'm not seeing any issues either on lmstudio 0.2.27, can you guys share your hardware and settings?
Tested on windows with a 3070 and linux with a 3090
No issue here and i even tried it in french for a full 8k tokens chat.
windows 10 with a 3090 ti llama.cpp + sillytavern as a front end and also tried with kobold.cpp (q8 quant)
I tried this https://huggingface.co/legraphista/Gemma-2-9B-It-SPPO-Iter3-IMat-GGUF and it works fine, so idk what is wrong with these ggufs, I am using koboldcpp latest. And for me, it doesn't even load the model, errors occur.
That's very interesting since that quant from @legraphista (tagged so you can consider updating) is using a version of llama.cpp with a broken Gemma 2 implementation, so your experience should only be better on mine. What's the error you get @AndrewLockhart ?
Thanks for the tag @bartowski . I'll hold on from updating until we understand why my broken version works in @AndrewLockhart 's setup
Yeah good call, need more details ..
I re-downloaded one gguf from this repo and it works fine, maybe I just had an old version of the file. So everything is fine now.
Ah good good that is the best case scenario haha
Thanks for letting us know! I'll be re-processing my repo to apply the fixes