Can we add `use_scaled_rope` in the config.json?
This will help to determine whether we need to turn on the patch or not. From the current model config.json, there is no way for us to know when to turn on/off.
Simply
use_scaled_rope: true,
config.json already has rope_scaling attribute that is null by default.
At least vLLM has some logic that can leverage this: https://github.com/vllm-project/vllm/blob/09c2eb85ddd3b2585979f4cd9cc97168d86718b6/vllm/model_executor/layers/rotary_embedding.py#L739
Probably same is the case for HF transformers
Understand about the logic. But for general LLAMA 3, we don't need to apply
def apply_scaling(freqs: torch.Tensor):
+ ...
But now it is a must have for 3.1. Just trying to add some toggle at config.json layer to see when we need to turn this on. This will be helpful for library building. I think we need to align with transformers library and also vLLM with a single standard here.
We adde the rope_type
with llama3