How many tokens is one image?

#47
by MoritzLaurer HF staff - opened

The model card has the line # The default range for the number of visual tokens per image in the model is 4-16384. You can set min_pixels and max_pixels according to your needs, such as a token count range of 256-1280, to balance speed and memory usage.

Does this mean that one image can be anything from 4 to 16384? If I use Qwen-vl via vLLM, can I decreased the amount of required tokens by just reducing the image resolution?

Does this mean that one image can be anything from 4 to 16384?

depending on the final resolution of the image; a patch of 28 * 28 pixels is a token

If I use Qwen-vl via vLLM, can I decreased the amount of required tokens by just reducing the image resolution?

yes, and it is recommended to set min_pixels and max_pixels according to your needs

Sign up or log in to comment