-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 31 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 25 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 121 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 21
Collections
Discover the best community collections!
Collections including paper arxiv:2410.09732
-
Differential Transformer
Paper • 2410.05258 • Published • 165 -
Baichuan-Omni Technical Report
Paper • 2410.08565 • Published • 83 -
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Paper • 2410.17243 • Published • 88 -
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors
Paper • 2410.16271 • Published • 80
-
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Paper • 2410.09732 • Published • 54 -
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Paper • 2410.10139 • Published • 50 -
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation
Paper • 2410.09584 • Published • 45
-
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines
Paper • 2408.01050 • Published • 8 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 33 -
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Paper • 2409.02795 • Published • 72 -
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Paper • 2409.04593 • Published • 22
-
VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads
Paper • 2407.18245 • Published • 8 -
AIM 2024 Sparse Neural Rendering Challenge: Dataset and Benchmark
Paper • 2409.15041 • Published • 12 -
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Paper • 2410.09732 • Published • 54 -
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Paper • 2410.13754 • Published • 74
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 12 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 31
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 26 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 12 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 46 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 28
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 21 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 80 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 144 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25