Text model
#1
by
xentnex
- opened
Is it possible to change Llama with Qwen, Mistral, etc.. for ultravox text part ?
Is it possible to quantize text model for faster inferencing ? 4bit, 5bit, 8bit ?
While we primarily use Llama for development, Ultravox is designed to work with most LLMs in Hugging Face. It is possible that some parts of the code may break, and we welcome comments or, even better, direct contributions to the GitHub repository to address these issues.
As for quantization, it is not supported yet.
Exactly. Just to clarify though, using other models requires retraining the adapter for the downstream model. The code and configs are available at https://ultravox.ai/