-
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Paper • 2310.19773 • Published • 19 -
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Paper • 2310.05863 • Published • 1 -
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 84 -
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
Paper • 2311.10126 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2310.19773
-
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 40 -
De-Diffusion Makes Text a Strong Cross-Modal Interface
Paper • 2311.00618 • Published • 21 -
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Paper • 2310.19773 • Published • 19 -
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Paper • 2310.15308 • Published • 22
-
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 6 -
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 14 -
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Paper • 2201.12086 • Published • 3 -
ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
Paper • 2305.15028 • Published • 1
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 22 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 16 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 9 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 8