Anyone have gguf quants?
#3
by
lemon07r
- opened
A big thanks for this farewell gift. Might be one of the best models in this size we have for a while since finetuning for 32b/35b is slow (this is the only good one from what I can tell). I'm wondering if anyone has gguf quants for this model.
I think the implementation of bpe tokenizer from llama.cpp is still incorrect, and it won't work as expected unless someone fix that. And it's the same case with cohere-command-r. The only different thing is that I replaced the special tokens to those from chatml.
I would recommend aphrodite-engine for accelerated inference with f16.
Just added them. Issue is fixed in llama.cppQuantFactory/CausalLM-35b-beta-long-GGUF
bartowski/35b-beta-long-GGUF
is OK with the latest llama.cpp or koboldcpp.