Dose Huggingface Text Generation Inference engine support this GGUF version?
No, TGI is aimed to serve LLMs on GPU, and supports origin / gptq / awq models.
If you want to use this GGUF model you may want to try llama.cpp
· Sign up or log in to comment