-
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Paper • 2312.06134 • Published • 2 -
Efficient Monotonic Multihead Attention
Paper • 2312.04515 • Published • 6 -
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 37 -
Exploring Format Consistency for Instruction Tuning
Paper • 2307.15504 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2311.10768
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 99 -
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 38 -
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper • 2402.10193 • Published • 17 -
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper • 2402.09727 • Published • 35
-
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video • Updated • 536k • 2.68k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 50 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper • 2311.12454 • Published • 29
-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 16 -
Mixtral of Experts
Paper • 2401.04088 • Published • 159 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 71 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 43
-
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Paper • 2311.08263 • Published • 15 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118 -
microsoft/Orca-2-13b
Text Generation • Updated • 17.6k • 663 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 16
-
ChatAnything: Facetime Chat with LLM-Enhanced Personas
Paper • 2311.06772 • Published • 34 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 16 -
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
Paper • 2311.12052 • Published • 32 -
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Paper • 2401.10061 • Published • 28
-
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 28 -
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Paper • 2311.08692 • Published • 12 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 16
-
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 16 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 16 -
TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 37 -
Time is Encoded in the Weights of Finetuned Language Models
Paper • 2312.13401 • Published • 19