Fix regression
#34
by
TimeRobber
- opened
Mixtral doesn't use sliding window attention. We force set it to null since the default in transformers is 4k.
I think the regression came from https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/commit/858fdc292793fc3e671bf51fc5586c5cc10fbe3a .
@ybelkada
did you use a specific script to update the PR? If so I think you need to update that script so that the default write "null".
The configuration file (and conversion if needed) will be adjusted accordingly thanks reporting!
@TimeRobber I updated the conversion script already here: https://github.com/huggingface/transformers/pull/28068 based on your suggestion