MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Paper • 2404.05726 • Published Apr 8 • 20
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding Paper • 2406.07471 • Published Jun 11 • 1
VISA: Reasoning Video Object Segmentation via Large Language Models Paper • 2407.11325 • Published Jul 16 • 1
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Paper • 2407.15841 • Published Jul 22 • 39
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published Sep 2 • 26
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model Paper • 2409.01199 • Published Sep 2 • 12