Generative Multiple Modality - a VoladorLuYu Collection

VoladorLuYu 's Collections

Research on LLM

Generative Multiple Modality

Super Alignment

Foundation Machine Learning

Graph Foundation Multimodal Models

Symbolic LLM Reasoning

Data-efficient LLMs

Understanding LLM

synthetic code generation

Diffusion Models

LLM+Architecture

LLM+Self-Play RL

Generative Multiple Modality

updated Jun 25

Random Field Augmentations for Self-Supervised Representation Learning

Paper • 2311.03629 • Published Nov 7, 2023 • 6
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Paper • 2311.04589 • Published Nov 8, 2023 • 18
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

Paper • 2311.04901 • Published Nov 8, 2023 • 7
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 26
Trusted Source Alignment in Large Language Models

Paper • 2311.06697 • Published Nov 12, 2023 • 10
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

Paper • 2311.07575 • Published Nov 13, 2023 • 13
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

Paper • 2309.07915 • Published Sep 14, 2023 • 4
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 65
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Paper • 2309.01131 • Published Sep 3, 2023 • 1
Multimodal Graph Learning for Generative Tasks

Paper • 2310.07478 • Published Oct 11, 2023 • 1
Language-Informed Visual Concept Learning

Paper • 2312.03587 • Published Dec 6, 2023 • 5
OneLLM: One Framework to Align All Modalities with Language

Paper • 2312.03700 • Published Dec 6, 2023 • 20
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Paper • 2401.11708 • Published Jan 22 • 29
MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24 • 44
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21 • 21
Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 16
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

Paper • 2402.05195 • Published Feb 7 • 18
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing

Paper • 2402.10294 • Published Feb 15 • 22
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Paper • 2402.15504 • Published Feb 23 • 21
Robust Gaussian Splatting

Paper • 2404.04211 • Published Apr 5 • 8
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

Paper • 2404.04478 • Published Apr 6 • 12
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12 • 27
Factorized Diffusion: Perceptual Illusions by Noise Decomposition

Paper • 2404.11615 • Published Apr 17 • 2
Dynamic Typography: Bringing Words to Life

Paper • 2404.11614 • Published Apr 17 • 43
MultiBooth: Towards Generating All Your Concepts in an Image from Text

Paper • 2404.14239 • Published Apr 22 • 8
Adding Conditional Control to Text-to-Image Diffusion Models

Paper • 2302.05543 • Published Feb 10, 2023 • 40
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

Paper • 2406.12649 • Published Jun 18 • 15