Maxi PRO

maxiw

AI & ML interests

Computer Agents | VLMs

Organizations

Posts 3

view post
Post
638
Exciting to see open-source models thriving in the computer agent space! ๐Ÿ”ฅ
I just built a demo for OS-ATLAS: A Foundation Action Model For Generalist GUI Agents โ€” check it out here: maxiw/OS-ATLAS

This demo predicts bounding boxes based on screenshot + instructions as input.
view post
Post
2218
The new Qwen-2 VL models seem to perform quite well in object detection. You can prompt them to respond with bounding boxes in a reference frame of 1k x 1k pixels and scale those boxes to the original image size.

You can try it out with my space maxiw/Qwen2-VL-Detection

datasets

None public yet