the output decode into words were wrong.

#5
by paradoxian - opened

Snipaste_2024-07-05_23-38-21.png

get the answer "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!", somebody knows why??

Which codebase did you use for inference?

I spotted that in the LLaVA-NexT repository, they explicitly assign pad_token_id to 0 for some reason. See here I guess this might be the reason you are getting lots of "!!!".

Not 100% sure though

You could try using TGI for serving llava-next.

Sign up or log in to comment