the output decode into words were wrong.
#5
by
paradoxian
- opened
get the answer "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!", somebody knows why??
Which codebase did you use for inference?
I spotted that in the LLaVA-NexT repository, they explicitly assign pad_token_id
to 0 for some reason. See here I guess this might be the reason you are getting lots of "!!!".
Not 100% sure though