The difference between Q8 and F16 models

by BigBoss0000 - opened 4 days ago

Discussion

BigBoss0000

4 days ago

Is there any big difference between the Q8 and F16 models? Like will F16 model perform 2 times better than Q8?

nicoboss

4 days ago

•

edited 4 days ago

Is there any big difference between the Q8 and F16 models? Like will F16 model perform 2 times better than Q8?

No the difference is so small it can is almost not be measured. I would say i1-Q5_K_M and larger have no meaningful difference to the unquantized model. Instead of Q8 I recommend using i1-Q6 from https://huggingface.co/mradermacher/Llama-3.2-3B-Instruct-uncensored-i1-GGUF. Here some plots I created a week ago for some other models. The small "i"-prefix on the plot means that wighted/imatrix quants are used.

BigBoss0000

3 days ago

Is there any big difference between the Q8 and F16 models? Like will F16 model perform 2 times better than Q8?

No the difference is so small it can is almost not be measured. I would say i1-Q5_K_M and larger have no meaningful difference to the unquantized model. Instead of Q8 I recommend using i1-Q6 from https://huggingface.co/mradermacher/Llama-3.2-3B-Instruct-uncensored-i1-GGUF. Here some plots I created a week ago for some other models. The small "i"-prefix on the plot means that wighted/imatrix quants are used.

Thanks for all those graphs, im not very familiar with LLMs but i think i understood

BigBoss0000 changed discussion status to closed 3 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment