How many tokens is one image?

#47

by MoritzLaurer HF staff - opened Oct 21

Discussion

MoritzLaurer

Oct 21

•

edited Oct 21

The model card has the line # The default range for the number of visual tokens per image in the model is 4-16384. You can set min_pixels and max_pixels according to your needs, such as a token count range of 256-1280, to balance speed and memory usage.

Does this mean that one image can be anything from 4 to 16384? If I use Qwen-vl via vLLM, can I decreased the amount of required tokens by just reducing the image resolution?

jklj077

Qwen org about 1 month ago

Does this mean that one image can be anything from 4 to 16384?

depending on the final resolution of the image; a patch of 28 * 28 pixels is a token

If I use Qwen-vl via vLLM, can I decreased the amount of required tokens by just reducing the image resolution?

yes, and it is recommended to set min_pixels and max_pixels according to your needs

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment