Intel/llava-gemma-2b · Update README.md

Update README.mddca66d9d

PerRing

Jul 30

the original example code included two BOS tokens during inference.

modified it so that only one BOS token is included.

matthewlyleolson

Intel org Aug 1

I only see one token when I run that code. I'm not sure why you are seeing two BOS tokens.

PerRing

Aug 2

In the latest version of the transformer, the BOS token is added at the beginning in this part

prompt = processor.tokenizer.apply_chat_template(
    [{'role': 'user', 'content': "<image>\nWhat's the content of the image?"}],
    tokenize=False,
    add_generation_prompt=True
)

and it is also added at the beginning in this part.

inputs = processor(text=prompt, images=image, return_tensors="pt")

Therefore, you can see that the final input_ids contain two BOS tokens.

print(inputs.input_ids)

>> tensor([[     2,      2,    106,   1645,    108, 256000,    108,   1841, 235303,
         235256,    573,   3381,    576,    573,   2416, 235336,    107,    108,
            106,   2516,    108]])