will there be an FP8 version later?
#6
by
ZealotTt
- opened
This is a very good project, thank you for your efforts!
will there be a distilled FP8 version model in the later stage?
The model can be loaded in FP8 or int8 using quanto, as mentioned in the inference section of README.md. There is a Q5 quantized community model. I personally won't be doing GGUF versions, but if someone else wants to convert them I would welcome it.