GGML Quantize?

by leonardlin - opened Aug 11, 2023

Aug 11, 2023

So, I'm able to run the sample code (although frustratingly, it want a token even with a fully local copy, and even if I force the config file to hardcode my local path).

I'm trying to use GGML's GPT-NeoX support to convert the model. I've swapped the tokenizer with the LlamaTokenizer.from_pretrained("novelai/nerdstash-tokenizer-v1", additional_special_tokens=['▁▁']) from the example and also swapped the AutoModelForCausalLM() as well and am able to generate a ggml-model-f16.bin, however if I try to run gpt-neox on the model, here's the error I get:

bin/gpt-neox -m /models/llm/jp-stablelm/stabilityai_japanese-stablelm-instruct-alpha-7b/ggml-model-f16.bin
main: seed = 1691765429
gpt_neox_model_load: loading model from '/models/llm/jp-stablelm/stabilityai_japanese-stablelm-instruct-alpha-7b/ggml-model-f16.bin' - please wait ...
gpt_neox_model_load: n_vocab = 65535
gpt_neox_model_load: n_ctx   = 1024
gpt_neox_model_load: n_embd  = 4096
gpt_neox_model_load: n_head  = 32
gpt_neox_model_load: n_layer = 32
gpt_neox_model_load: n_rot   = 32
gpt_neox_model_load: par_res = 1
gpt_neox_model_load: ftype   = 1
gpt_neox_model_load: qntvr   = 0
gpt_neox_model_load: ggml ctx size = 16390.52 MB
gpt_neox_model_load: memory_size =   512.00 MB, n_mem = 32768
gpt_neox_model_load: unknown tensor 'transformer.embed_in.weight' in model file
main: failed to load model from '/models/llm/jp-stablelm/stabilityai_japanese-stablelm-instruct-alpha-7b/ggml-model-f16.bin'

Just checking in to see if anyone has had better luck with GGML quantize support/how different this model is from other GPT-NeoX or StableLM models that have been quantized?

sehiro

Aug 12, 2023

Me too...

metalwhale

Aug 13, 2023

•

edited Aug 13, 2023

I'm not entirely certain, but I believe the reason is that in main.cpp file, they are currently hard-coding layer names with prefix gpt_neox.
This approach works well with stablelm-base-alpha-7b (you can check it at pytorch_model.bin.index.json file). However, in the case of japanese-stablelm-base-alpha-7b, they are using prefix transformer (according to it's pytorch_model.bin.index.json file).

I think the simplest solution would be to modify the prefix in main.cpp file to transformer.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment