Unable to inference beyond sliding window length
#128
by
kreas
- opened
Using the following config:
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-v0.1",
torch_dtype=torch.float16,
device_map="auto",
use_flash_attention_2=True,
).to(device)
Leads to error:
File "/home/andreas/.local/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 408, in forward
raise ValueError(
ValueError: past key much have a shape of (1, 32, 4095, 128), got torch.Size([1, 8, 4094, 128])
Due to mismatch between the window and kv_cache lengths