What is the maximum length of Mistral-7B-Instruct-v0.2?
#37
by
xcjthu
- opened
According to the config.json
file, the base model Mistral-7B-v0.1
and its corresponding instruction tuning version, Mistral-7B-Instruct-v0.1
, have a maximum length of 32k. However, the report for Mistral-7B
indicates that these models are trained within an 8k context window. So, what is the maximum length these models can handle?
Additionally, the config.json
file reveals that the RoPE base for Mistral-7B-Instruct-v0.2
has changed from 10000.0 to 1000000.0. Does this mean that the model was fine-tuned after the NTW-Aware positional encoding transformation?