QuantFactory/Phi-3-mini-128k-instruct-GGUF · error when loading the model

May 24

Hi,

I am trying to load the model using llama.cpp but I am getting an error message:
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 197, got 195

Code:
from llama_cpp import Llama

llm_n_gpu_layers = -1
llm_split_mode = 0
llm_main_gpu = 0

llm = Llama(
model_path="./models/phi3-128k/Phi-3-mini-128k-instruct-Q4_K_M.gguf",
n_gpu_layers=llm_n_gpu_layers,
n_ctx=3072,
chat_format="phi-3-chat",
offload_kqv=True,
split_mode=llm_split_mode,
main_gpu=llm_main_gpu)

I can load and use the phi-3-mini-4k in the gguf format (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) but not the 128k version...

Any hint or advice would be very helpful.
Thanks

munish0838

Quant Factory org May 24

Looking into the issue

jkkphys

May 31

Any luck? I am running into the same issue, even when converting on my own.

munish0838

Quant Factory org Jun 1

@jkkphys Seems like some issue in llama.cpp, I tried recreating but still ran into same issue

neo0oen

Jun 2

•

edited Jun 2

lower gpu layers to zero it sorted me out :]
or gpu offlading however its called on your end over there XD

jkkphys

Jun 2

So you are saying that running via CPU works fine for you? I’ll give it a shot, but it kind of limits usefulness.