error when loading the model

#3
by StefanStroescu - opened

Hi,

I am trying to load the model using llama.cpp but I am getting an error message:
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 197, got 195

Code:
from llama_cpp import Llama

llm_n_gpu_layers = -1
llm_split_mode = 0
llm_main_gpu = 0

llm = Llama(
model_path="./models/phi3-128k/Phi-3-mini-128k-instruct-Q4_K_M.gguf",
n_gpu_layers=llm_n_gpu_layers,
n_ctx=3072,
chat_format="phi-3-chat",
offload_kqv=True,
split_mode=llm_split_mode,
main_gpu=llm_main_gpu)

I can load and use the phi-3-mini-4k in the gguf format (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf) but not the 128k version...

Any hint or advice would be very helpful.
Thanks

Quant Factory org

Looking into the issue

Any luck? I am running into the same issue, even when converting on my own.

Quant Factory org

@jkkphys Seems like some issue in llama.cpp, I tried recreating but still ran into same issue

lower gpu layers to zero it sorted me out :]
or gpu offlading however its called on your end over there XD

So you are saying that running via CPU works fine for you? I’ll give it a shot, but it kind of limits usefulness.

Sign up or log in to comment