Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate Paper • 2410.07167 • Published 7 days ago • 35
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning Paper • 2410.06373 • Published 8 days ago • 33
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation Paper • 2410.05363 • Published 9 days ago • 42