VideoForMed - a che111 Collection

che111 's Collections

Work for 3D Medical Vision

Med Multimodal Learning

Localize Viusal Understanding

Generative Model

Synthetic Data Learning

Explaniable, Fairness Work

General Multimodal Learning

VideoForMed

updated Sep 5

Distilling Vision-Language Models on Millions of Videos

Paper • 2401.06129 • Published Jan 11 • 14
Koala: Key frame-conditioned long video-LLM

Paper • 2404.04346 • Published Apr 5 • 5
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Paper • 2404.05726 • Published Apr 8 • 20
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

Paper • 2406.07471 • Published Jun 11 • 1
VISA: Reasoning Video Object Segmentation via Large Language Models

Paper • 2407.11325 • Published Jul 16 • 1
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Paper • 2407.15841 • Published Jul 22 • 39
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Paper • 2409.01071 • Published Sep 2 • 26
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Paper • 2409.01199 • Published Sep 2 • 12