NameError: name 'flash_attn_func' is not defined
I have flash-attn installed (v 2.5.2), but I get:
Exception has occurred: NameError
name 'flash_attn_func' is not defined
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 65, in
_flash_supports_window_size = "window_size" in list(inspect.signature(flash_attn_func).parameters)
File "/home/neman/PROGRAMMING/PYTHON/DuckDuckGo_Search/AQLM_test1.py", line 3, in
quantized_model = AutoModelForCausalLM.from_pretrained(
NameError: name 'flash_attn_func' is not defined
toy code:
from transformers import AutoTokenizer, AutoModelForCausalLM
quantized_model = AutoModelForCausalLM.from_pretrained(
"/mnt/disk2/LLM_MODELS/models/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf",
torch_dtype="auto", device_map="auto", low_cpu_mem_usage=True,
)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-v0.1")
output = quantized_model.generate(tokenizer("", return_tensors="pt")["input_ids"].cuda(), max_new_tokens=10)
output = quantized_model.generate(tokenizer("I'm AQLM, ", return_tensors="pt")["input_ids"].cuda(), min_new_tokens=128, max_new_tokens=128)
print(tokenizer.decode(output[0]))
Any ideas?
UPDATE:
I checked and saw that from few days ago there is flash-attn 2.5.3 so I updated. Now I get different error:
Exception has occurred: RuntimeError
Only Tensors of floating point and complex dtype can require gradients
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 308, in init
self.q_proj = QuantizedLinear(self.hidden_size, self.num_heads * self.head_dim, bias=False, **config.aqlm)
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 889, in init
self.self_attn = MIXTRAL_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 1093, in
[MixtralDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 1093, in init
[MixtralDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 1277, in init
self.model = MixtralModel(config)
File "/home/neman/PROGRAMMING/PYTHON/DuckDuckGo_Search/AQLM_test1.py", line 3, in
quantized_model = AutoModelForCausalLM.from_pretrained(
RuntimeError: Only Tensors of floating point and complex dtype can require gradients
Installing the latest accelerate will fix the second error.
It did. Thank you for support.