arcee-ai/Llama-3.1-SuperNova-Lite · Bad at Following Following instructions

Sep 11

Amazing work , loving the model .
My only problem thats its not the best at Following complicated system prompts.
I was using qkm quant so that might be it , overall great job.

Crystalcareai

Arcee AI org Sep 11

Appreciate it - we'll look at some more sophisticated quants in the future, and will likely update the model over time as we refine the technique.

Joseph717171

Sep 11

•

edited Sep 11

I noticed when quantizing the model to GGUF, The resultant quantized model was smaller than a equally quantized version Llama– 3.1–8B–Instruct - even when keeping the Output Tensors in Q8_0 and the Embeddings up-casted to F32, as I usually do. I don’t know why this is, but, in further quantization scheme experimenting, I learned that by up-casted the Output Tensors to F32 for the model and, as initially, up-casting the Embeddings to F32 that the model’s size comes to what it should be given the quantization settings. And, that it performs better.

GGUF OF32.EF32.IQuants are available in:
IQ4_K_M, IQ6_K, and, IQ8_0

You can find them below:
https://huggingface.co/Joseph717171/Llama-3.1-SuperNova-Lite-8.0B-OQ8_0.EF32.IQ4_K-Q8_0-GGUF

Crystalcareai

Arcee AI org Sep 11

Thanks @Joseph717171 , appreciate your efforts.