Dogge/llama-3-70B-instruct-uncensored

#32
by Nekotekina - opened

Hello, would it be possible to make iMat for this?

Sure, it's in the queue, and can take a few days (I will do static and then imatrix quants). But next time please give the full url :)

mradermacher changed discussion status to closed

yea, +1. Back in the days there was rummors about IQ3_XXS quants standing above the other quants... it would be lovely to know if it still holds in the times of llama 3 ...

llama.cpp unfortunately simply crashes on iq-quant generation a lot, and this just happened to this model, so there won't be iq3_xxs from me for this model.

https://github.com/ggerganov/llama.cpp/issues/6018

Doesn't seem as if anybody working on llama.cpp is trying to fix it.

As for IQ3_XXS standing above other quants, I can assure you it's not true :) In general IQ-quants perform better than their non-I versions of very similar size, but an IQ4_XS is still better than IQ3_XXS. What I (personally) found to be true is that IQ3_XXS is performing quite well, though, if you are in a pinch for memory (aren't we all), so it's a good choice if you can run it.

I really tried to find the info... but wasn't able to :(
In some discussion on (if I remember correctly) some 70b Miqu models someone uploaded a graph comparing the perplexity of IQ quants and the IQ3_XXS give them outstanding results far ahead of any other IQ quants. Due to my lack of knowledge in this regard, I only rely on what other people have said, so I am not able to evaluate such information if it is accurate or just some bad threat. Seemed to me weird, but what can I do... I wanna provide you with that graph to see your thoughts on it, but as I said, I wasnt succesful. Any chance you know what Im talking about?

Well, my model pages have a graph by the guy who invented the imatrix, and it clearly shows IQ4_XS being far better. And the data from Artefact2 (also linked) shows the same.

Think about it that way: why would IQ4_XS exist if IQ3_XXS is better and smaller? The only reason, other than mistakes on the sides of the people who created it, would be that it is faster than IQ3_XXS (which it often is), but it would still be a bad format to even exist for just that reason.

Also, IQ4_XS and IQ3_XXS both come in imatrix and non-imatrix variants. The imatrix ones are generally better, too.

thx 4 your patience with me

don't worry, the real issue is that llama.cpp is such a crashfest... iq3_xxs would be a great choice...

Sign up or log in to comment