Trust but Verify: Programmatic VLM Evaluation in the Wild Paper • 2410.13121 • Published 23 days ago • 2