Model repeating information and "spitting out" random characters

#14

by brazilianslib - opened Jun 28

Jun 28

First of all, congratulations on the launch. Gemma 2 9B is, at least in my tests, the best model for PT-BR. Much better than much larger models.
However, problems are constantly happening, such as:

Repeat information;
"Spit" text infinitely;
Place tags like "</start_of" at the end of your answer.
I am eagerly awaiting a solution.

Once again, I thank the entire Google Gemma team.

lysandre

Google org Jul 1

Hello! Can you make sure you're on the latest transformers version, v4.42.3?
We added soft-capping in this version which may result in better results in your tests.

brazilianslib

Jul 3

Hello! Can you make sure you're on the latest transformers version, v4.42.3?
We added soft-capping in this version which may result in better results in your tests.

Just perfect! Amazing multilingual model!

zokica

Jul 8

Hello! Can you make sure you're on the latest transformers version, v4.42.3?
We added soft-capping in this version which may result in better results in your tests.

I installed this version, the problem is that when I use flash_attention_2, i get 100% random output in 4bits.
(attn_implementation="flash_attention_2")

GPT007

Jul 12

Same here

Renu11

Google org Jul 24

Hi @zokica , @GPT007 , We recommend using with eager attention for Gemma2 models. Please refer to this doc for more details. Thank you.

rsdfsfas

Jul 24

But they did made a fix for flash attention 2, which does not work. It is supposed to fix things but this did not work.

I get the same results for eager and spd attention.

GPT007

Jul 25

https://github.com/huggingface/transformers/pull/32188

Renu11

Google org Aug 15

Hi, I hope the issue has been resolved. Please let us know if any further assistance is needed. Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment