WHY "max_position_embeddings": 2048

by chenxi118 - opened Dec 2, 2023

Dec 2, 2023

Typically, models based on LLaMA-2 have a parameter size of 4K, but why is it 2K here? Will this lead to a shorter effective understanding of the context by the model?

yechen

Tiger Research org Dec 3, 2023

this is because we fine tuned this version of model using 2048 max length to group data, we found almost all demonstration data within this length. However, the model should work fine with 4k length or even longer, RoPE can extrapolate well due to its functional form.

chenxi118

Dec 3, 2023

Thanks！

chenxi118 changed discussion status to closed Dec 3, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment