GPTQ version?
#4
by
mer0mingian
- opened
Hi jphme,
thanks for providing the model! Would it be possible to provide a gptq-quantized version? E.g. in collaboration with TheBloke?
For people that want to run the model in a cost-saving way and do not have their own hardware to host it, this would remove many barriers...
Cheers
We're planning on doing a GGUF conversion, if that helps? Should probably be ready tomorrow.
Thanks. Haven't worked with that, but happy to try. :)
@mer0mingian here we go :) https://huggingface.co/morgendigital/Llama-2-13b-chat-german-GGUF/tree/main
Either inference it with llama.cpp directly, or use one of the popular tools like text-generation-webui, koboldcpp, etc...