VoladorLuYu
's Collections
Generative Multiple Modality
updated
Random Field Augmentations for Self-Supervised Representation Learning
Paper
•
2311.03629
•
Published
•
6
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper
•
2311.04589
•
Published
•
18
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and
reusing ModulEs
Paper
•
2311.04901
•
Published
•
7
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality
Foundation Models
Paper
•
2311.06783
•
Published
•
26
Trusted Source Alignment in Large Language Models
Paper
•
2311.06697
•
Published
•
10
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
Multi-modal Large Language Models
Paper
•
2311.07575
•
Published
•
13
MMICL: Empowering Vision-language Model with Multi-Modal In-Context
Learning
Paper
•
2309.07915
•
Published
•
4
LMDX: Language Model-based Document Information Extraction and
Localization
Paper
•
2309.10952
•
Published
•
65
Attention Where It Matters: Rethinking Visual Document Understanding
with Selective Region Concentration
Paper
•
2309.01131
•
Published
•
1
Multimodal Graph Learning for Generative Tasks
Paper
•
2310.07478
•
Published
•
1
Language-Informed Visual Concept Learning
Paper
•
2312.03587
•
Published
•
5
OneLLM: One Framework to Align All Modalities with Language
Paper
•
2312.03700
•
Published
•
20
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
Generating with Multimodal LLMs
Paper
•
2401.11708
•
Published
•
29
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper
•
2401.13601
•
Published
•
44
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass
Diffusion Transformers
Paper
•
2401.11605
•
Published
•
21
Scalable Diffusion Models with Transformers
Paper
•
2212.09748
•
Published
•
16
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion
Models by Leveraging CLIP Latent Space
Paper
•
2402.05195
•
Published
•
18
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video
Editing
Paper
•
2402.10294
•
Published
•
22
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept
Composition
Paper
•
2402.15504
•
Published
•
21
Robust Gaussian Splatting
Paper
•
2404.04211
•
Published
•
8
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Paper
•
2404.04478
•
Published
•
12
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and
Training Strategies
Paper
•
2404.08197
•
Published
•
27
Factorized Diffusion: Perceptual Illusions by Noise Decomposition
Paper
•
2404.11615
•
Published
•
2
Dynamic Typography: Bringing Words to Life
Paper
•
2404.11614
•
Published
•
43
MultiBooth: Towards Generating All Your Concepts in an Image from Text
Paper
•
2404.14239
•
Published
•
8
Adding Conditional Control to Text-to-Image Diffusion Models
Paper
•
2302.05543
•
Published
•
40
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations
for Vision Foundation Models
Paper
•
2406.12649
•
Published
•
15