Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published 3 days ago • 15
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper • 2411.07975 • Published 2 days ago • 17
VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads Paper • 2407.18245 • Published Jul 25 • 8
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published 3 days ago • 25
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper • 2411.07199 • Published 3 days ago • 41
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models Paper • 2411.07232 • Published 3 days ago • 52
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published 7 days ago • 25
LLM2CLIP Collection LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 5 items • Updated 3 days ago • 12
GazeGen: Gaze-Driven User Interaction for Visual Content Generation Paper • 2411.04335 • Published 8 days ago • 14
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published 7 days ago • 20
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model Paper • 2411.04496 • Published 7 days ago • 19
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning Paper • 2411.05003 • Published 7 days ago • 64
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published 7 days ago • 96
RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval Paper • 2411.04752 • Published 7 days ago • 15
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper • 2411.04709 • Published 9 days ago • 23
Distill Visual Chart Reasoning Ability from LLMs to MLLMs Paper • 2410.18798 • Published 21 days ago • 19
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published 9 days ago • 57
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published 10 days ago • 31
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning Paper • 2411.02337 • Published 10 days ago • 34