microsoft/Florence-2-large · REGION_TO

I have worked with the Florence2 model and found that it delivers good results for my use case, particularly in the region-to-description task. This task uses the following input: REGION_TO_DESCRIPTION + bounding box coordinates + image. However, we would like to make some modifications. Instead of passing a single bounding box along with the image, we would like to pass multiple bounding boxes from the same image using a single inference, without batch processing. Is there a solution or method to achieve this?
Thank you in advance for your suggestion.

microsoft
/

Florence-2-large

REGION_TO_DESCRIPTION.