WHY "max_position_embeddings": 2048
#3
by
chenxi118
- opened
Typically, models based on LLaMA-2 have a parameter size of 4K, but why is it 2K here? Will this lead to a shorter effective understanding of the context by the model?
this is because we fine tuned this version of model using 2048 max length to group data, we found almost all demonstration data within this length. However, the model should work fine with 4k length or even longer, RoPE can extrapolate well due to its functional form.
Thanks!
chenxi118
changed discussion status to
closed