How many tokens is one image?
The model card has the line # The default range for the number of visual tokens per image in the model is 4-16384. You can set min_pixels and max_pixels according to your needs, such as a token count range of 256-1280, to balance speed and memory usage.
Does this mean that one image can be anything from 4 to 16384? If I use Qwen-vl via vLLM, can I decreased the amount of required tokens by just reducing the image resolution?
Does this mean that one image can be anything from 4 to 16384?
depending on the final resolution of the image; a patch of 28 * 28 pixels is a token
If I use Qwen-vl via vLLM, can I decreased the amount of required tokens by just reducing the image resolution?
yes, and it is recommended to set min_pixels and max_pixels according to your needs