Clarification on Box Coordinates Scaling
Hi Ferret Team,
Thanks for sharing this checkpoint!
I ran the following example on your demo, using Ferret-UI-Llama8b model and default parameters:
{
"id": 0,
"image": "appstore_reminders.png",
"image_h": 2532,
"image_w": 1170,
"conversations": [
{
"from": "human",
"value": "\nWhere is the Games Tab located?"
}
]
}
The response returned: Games Tab [[0, 906, 256, 965]]. However, this box doesn’t seem to align with the "Games Tab" in the image, whether scaled or unscaled.
Could you clarify the scaling logic applied to the box and how I should interpret it?
Thanks!
Demi
Hi
@demitsuki
sorry for the late reply!
You can check the scaling logic here: https://github.com/apple/ml-ferret/blob/main/ferretui/ferretui/eval/model_UI.py
To speed up:
# ratio
ratio_w = VOCAB_IMAGE_W * 1.0 / image_wdith
ratio_h = VOCAB_IMAGE_H * 1.0 /image_height
def get_bbox_coor(box, ratio_w, ratio_h):
return box[0] * ratio_w, box[1] * ratio_h, box[2] * ratio_w, box[3] * ratio_h