Collections
Discover the best community collections!
Collections including paper arxiv:2311.04589
-
LayoutPrompter: Awaken the Design Ability of Large Language Models
Paper • 2311.06495 • Published • 10 -
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 26 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 45 -
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper • 2311.04589 • Published • 18
-
Random Field Augmentations for Self-Supervised Representation Learning
Paper • 2311.03629 • Published • 6 -
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper • 2311.04589 • Published • 18 -
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Paper • 2311.04901 • Published • 7 -
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 26
-
Towards General Text Embeddings with Multi-stage Contrastive Learning
Paper • 2308.03281 • Published • 1 -
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Paper • 2310.05914 • Published • 14 -
EELBERT: Tiny Models through Dynamic Embeddings
Paper • 2310.20144 • Published • 3 -
Dynamic Word Embeddings for Evolving Semantic Discovery
Paper • 1703.00607 • Published • 1
-
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Paper • 2310.05737 • Published • 4 -
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Paper • 2308.16692 • Published • 1 -
Towards General Text Embeddings with Multi-stage Contrastive Learning
Paper • 2308.03281 • Published • 1 -
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
Paper • 2305.11554 • Published • 2
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 14 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 25 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 6 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20