Model repeating information and "spitting out" random characters
First of all, congratulations on the launch. Gemma 2 9B is, at least in my tests, the best model for PT-BR. Much better than much larger models.
However, problems are constantly happening, such as:
Repeat information;
"Spit" text infinitely;
Place tags like "</start_of" at the end of your answer.
I am eagerly awaiting a solution.
Once again, I thank the entire Google Gemma team.
Hello! Can you make sure you're on the latest transformers
version, v4.42.3?
We added soft-capping in this version which may result in better results in your tests.
Hello! Can you make sure you're on the latest
transformers
version, v4.42.3?
We added soft-capping in this version which may result in better results in your tests.
Just perfect! Amazing multilingual model!
Hello! Can you make sure you're on the latest
transformers
version, v4.42.3?
We added soft-capping in this version which may result in better results in your tests.
I installed this version, the problem is that when I use flash_attention_2, i get 100% random output in 4bits.
(attn_implementation="flash_attention_2")
But they did made a fix for flash attention 2, which does not work. It is supposed to fix things but this did not work.
I get the same results for eager and spd attention.
Hi, I hope the issue has been resolved. Please let us know if any further assistance is needed. Thanks!