Qwen/Qwen2.5-Coder-7B-Instruct · I periodically encounter infinite generations

I periodically encounter infinite generations in Qwen 2.5 7B Coder with FP8 quantization when feeding long texts around 20+k characters into the context.

I'm looking at their configs:
https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/config.json
https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/generation_config.json
https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct/blob/main/tokenizer_config.json

In short, these three configs have completely confused me.

Hmm, I also found this: https://github.com/QwenLM/Qwen2.5-Coder
Important

We have updated both the special tokens and their corresponding token ids to maintain consistency with Qwen2.5. The new special tokens are as follows:
{
"<|fim_prefix|>": 151659,
"<|fim_middle|>": 151660,
"<|fim_suffix|>": 151661,
"<|fim_pad|>": 151662,
"<|repo_name|>": 151663,
"<|file_sep|>": 151664,
"<|im_start|>": 151644,
"<|im_end|>": 151645
}

How to properly modify config.json, generation_config.json, and tokenizer_config.json??