Could We combine AWQ and Importance Matrix calculation together to further improve perplexity.
Same as the question
Could we do that or it does not matter at all.
Autoawq can calculate AWQ for llama cpp quantization
https://github.com/casper-hansen/AutoAWQ/pull/285
Thanks
What does AutoAWQ do? I can go and look around in the quoted repo, but it would be much easier if someone explained their approach.
What does AutoAWQ do? I can go and look around in the quoted repo, but it would be much easier if someone explained their approach.
https://github.com/casper-hansen/AutoAWQ
https://github.com/mit-han-lab/llm-awq
https://arxiv.org/abs/2306.00978
Slide: https://www.dropbox.com/scl/fi/dtnp6h6y1mnp7g036axu6/AWQ-slide.pdf?rlkey=ffgh50hxhx8dmsnjiu8kef0ou&dl=0
If I understand their paper correctly, the scale search is also used in what I do for these quantized models, so not sure combining the two will help.
But I have now contributed the quantization approach used for these models to llama.cpp
.
My guess is that it is easier for the contributors of https://github.com/casper-hansen/AutoAWQ/ to try than for me to get up to speed with their repo.