What are the differences between yours and meta's offical one?

by c6sneaky - opened Jul 31

Jul 31

mgoin

Neural Magic org Jul 31

Meta skipped all layers' QKV/output matrices and the first and last layers completely. This breaks down to:

With 99.9% recovery and 80GB memory saved for NM

Fertel

Aug 1

Hi guys! Thank you for your work.

Meta used FBGEMM(https://github.com/pytorch/FBGEMM) and you used LLM Compressor (https://github.com/vllm-project/llm-compressor). I haven’t done extensive research, but could you clarify the main differences in their quantization procedures?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment