The quant doesn't load in vLLM / Aphrodite Engine

#1
by av-codes - opened

Unable to run this in either vLLM or Aphrodite. vLLM silently fails with a missing response from RPC engine, Aphrodite get stuck at aqlm_dequant. I assume both are silent errors in the underlying quantization library.

The 3.1 8B 1x16 loads in the same setup with vLLM, after fixing tokenizer config (correct EOS token + add missing chat template).

@av-codes can i use this type off quant in windows using vllm ?

@gopi87 this report is about the fact that it doesn't load. I didn't do any tests on windows

Sign up or log in to comment