FileNotFoundError: Could not find model in TheBloke/guanaco-65B-GPTQ
I am getting this error on every TheBloke models, I have just simply copy paste the code from repo.
this is the code:
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name_or_path = "TheBloke/guanaco-65B-GPTQ"
model_basename = "Guanaco-65B-GPTQ-4bit-128g.no-act.order"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)
and this is the error I am getting :
Downloading (β¦)okenizer_config.json: 100%
700/700 [00:00<00:00, 44.8kB/s]
Downloading tokenizer.model: 100%
500k/500k [00:00<00:00, 23.9MB/s]
Downloading (β¦)/main/tokenizer.json: 100%
1.84M/1.84M [00:00<00:00, 10.7MB/s]
Downloading (β¦)cial_tokens_map.json: 100%
411/411 [00:00<00:00, 29.0kB/s]
Downloading (β¦)lve/main/config.json: 100%
820/820 [00:00<00:00, 64.9kB/s]
Downloading (β¦)quantize_config.json: 100%
156/156 [00:00<00:00, 13.0kB/s]
FileNotFoundError Traceback (most recent call last)
Cell In[3], line 11
7 use_triton = False
9 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
---> 11 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
12 model_basename=model_basename,
13 use_safetensors=True,
14 trust_remote_code=True,
15 device="cuda:0",
16 use_triton=use_triton,
17 quantize_config=None)
19 """
20 To download from a specific branch, use the revision parameter, as in this example:
21
(...)
28 quantize_config=None)
29 """
File /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/auto.py:94, in AutoGPTQForCausalLM.from_quantized(cls, model_name_or_path, save_dir, device_map, max_memory, device, low_cpu_mem_usage, use_triton, inject_fused_attention, inject_fused_mlp, use_cuda_fp16, quantize_config, model_basename, use_safetensors, trust_remote_code, warmup_triton, trainable, **kwargs)
88 quant_func = GPTQ_CAUSAL_LM_MODEL_MAP[model_type].from_quantized
89 keywords = {
90 key: kwargs[key]
91 for key in signature(quant_func).parameters
92 if key in kwargs
93 }
---> 94 return quant_func(
95 model_name_or_path=model_name_or_path,
96 save_dir=save_dir,
97 device_map=device_map,
98 max_memory=max_memory,
99 device=device,
100 low_cpu_mem_usage=low_cpu_mem_usage,
101 use_triton=use_triton,
102 inject_fused_attention=inject_fused_attention,
103 inject_fused_mlp=inject_fused_mlp,
104 use_cuda_fp16=use_cuda_fp16,
105 quantize_config=quantize_config,
106 model_basename=model_basename,
107 use_safetensors=use_safetensors,
108 trust_remote_code=trust_remote_code,
109 warmup_triton=warmup_triton,
110 trainable=trainable,
111 **keywords
112 )
File /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py:714, in BaseGPTQForCausalLM.from_quantized(cls, model_name_or_path, save_dir, device_map, max_memory, device, low_cpu_mem_usage, use_triton, torch_dtype, inject_fused_attention, inject_fused_mlp, use_cuda_fp16, quantize_config, model_basename, use_safetensors, trust_remote_code, warmup_triton, trainable, **kwargs)
711 break
713 if resolved_archive_file is None: # Could not find a model file to use
--> 714 raise FileNotFoundError(f"Could not find model in {model_name_or_path}")
716 model_save_name = resolved_archive_file
718 if not use_triton and trainable:
FileNotFoundError: Could not find model in TheBloke/guanaco-65B-GPTQ
I recently updated all my GPTQ models for direct Transformers compatibility (coming very soon)
Please check the README again and you'll see that the model_basename
line is now: model_basename = "model"
. This is true for all branches in all GPTQ models.
Or in fact you can simply leave out model_basename
now:
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)
Because the model_basename is now also configured in quantize_config.json
.
In the next 24 - 48 hours I will be updating all my GPTQ READMEs to explain this in more detail, and provide example code for loading GPTQ models directly from Transformers. I am waiting for the new Transformers release to happen before I do this, which will be today or tomorrow.