ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf · NameError: name 'flash_attn

Feb 16

•

I have flash-attn installed (v 2.5.2), but I get:
Exception has occurred: NameError
name 'flash_attn_func' is not defined
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 65, in
_flash_supports_window_size = "window_size" in list(inspect.signature(flash_attn_func).parameters)
File "/home/neman/PROGRAMMING/PYTHON/DuckDuckGo_Search/AQLM_test1.py", line 3, in
quantized_model = AutoModelForCausalLM.from_pretrained(
NameError: name 'flash_attn_func' is not defined

toy code:
from transformers import AutoTokenizer, AutoModelForCausalLM
quantized_model = AutoModelForCausalLM.from_pretrained(
"/mnt/disk2/LLM_MODELS/models/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf",
torch_dtype="auto", device_map="auto", low_cpu_mem_usage=True,
)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-v0.1")
output = quantized_model.generate(tokenizer("", return_tensors="pt")["input_ids"].cuda(), max_new_tokens=10)
output = quantized_model.generate(tokenizer("I'm AQLM, ", return_tensors="pt")["input_ids"].cuda(), min_new_tokens=128, max_new_tokens=128)
print(tokenizer.decode(output[0]))

Any ideas?

UPDATE:
I checked and saw that from few days ago there is flash-attn 2.5.3 so I updated. Now I get different error:
Exception has occurred: RuntimeError
Only Tensors of floating point and complex dtype can require gradients
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 308, in init
self.q_proj = QuantizedLinear(self.hidden_size, self.num_heads * self.head_dim, bias=False, **config.aqlm)
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 889, in init
self.self_attn = MIXTRAL_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 1093, in
[MixtralDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 1093, in init
[MixtralDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
File "/home/neman/.cache/huggingface/modules/transformers_modules/BlackSamorez_Mixtral-8x7b-AQLM-2Bit-1x16-hf/modeling_mixtral_aqlm.py", line 1277, in init
self.model = MixtralModel(config)
File "/home/neman/PROGRAMMING/PYTHON/DuckDuckGo_Search/AQLM_test1.py", line 3, in
quantized_model = AutoModelForCausalLM.from_pretrained(
RuntimeError: Only Tensors of floating point and complex dtype can require gradients

BlackSamorez

IST Austria Distributed Algorithms and Systems Lab org Feb 16

Installing the latest accelerate will fix the second error.

Neman

Feb 16

It did. Thank you for support.

BlackSamorez changed discussion status to closed Feb 16