CLAIR-A: Leveraging Large Language Models to Judge Audio Captions Paper • 2409.12962 • Published Sep 19 • 2
view article Article Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark! By davidchan • Jul 23 • 3
ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video Paper • 2401.05314 • Published Jan 10 • 9