Bug about number generation?
Could you share an exact snippet?
here is my code.
import torch
from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLM
new_path = 'google/gemma-7b-it'
model = AutoModelForCausalLM.from_pretrained(new_path, device_map='cuda', torch_dtype=torch.float16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(new_path, trust_remote_code=True)
For the first case
input_text = "Introducing Einstein"
input_ids = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**input_ids, max_length=300)
print(tokenizer.decode(outputs[0]))
For the second case
chat = [
{ "role": "user", "content": "Introducing history of USA" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150, num_beams=3)
print(tokenizer.decode(outputs[0]))
Same issue here! I was using the 2b-it model yesterday because 7b wasn't compatible with my cuda 11.7 driver version, and it was working fine. Now that they pushed a patch and 7b-it works with my cuda version, I get a bunch of pads similar to the poster. I'm asking the model to return a numbered list:
Hi there, Surya from the Gemma team -- sorry for the delay, I saw this issue elsewhere as well, are you using the right formatter? What are your sampling settings?
Hi @myownskyW7 and @acondor99 , Could you please confirm if you are still facing the issue? As I tried replicating the given code and getting the proper output.
Please have a look at the below screenshot for your reference.