unable to load quantized 4bit_m

#5
by sunnykusawa - opened

Getting this error:
numpy._core._exceptions._ArrayMemoryError: Unable to allocate 62.6 GiB for an array with shape (131072, 128256) and data type float32

I am using 4bit quantized LLM so why its expecting 62.6 GiB for an array with shape (131072, 128256) and data type float32

how are you using it? I just load the same model in LM Studio without any issue. Could you please share your whole llama.cpp command?

i have installed llama cpp python package on my system and throught that am trying to load quantized 4 bit GGUF format. but it gave memory error.

it must be the configs you are setting for max context length etc. it can be loaded easily on LM Studio. (maybe using Q3?)

MaziyarPanahi changed discussion status to closed

Sign up or log in to comment