Collections
Discover the best community collections!
Collections including paper arxiv:2309.04354
-
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Paper • 2308.10110 • Published • 2 -
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion
Paper • 2308.06512 • Published • 2 -
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Paper • 2309.04354 • Published • 13 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5
-
Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts
Paper • 2009.10622 • Published • 1 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 48 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 70 -
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Paper • 2401.14361 • Published • 2
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 26 -
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Paper • 2308.12066 • Published • 4 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper • 2303.06182 • Published • 1 -
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
Paper • 2112.14397 • Published • 1
-
One Wide Feedforward is All You Need
Paper • 2309.01826 • Published • 31 -
Gated recurrent neural networks discover attention
Paper • 2309.01775 • Published • 7 -
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper • 2309.03852 • Published • 43 -
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 75
-
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts
Paper • 2309.04354 • Published • 13 -
Vision Transformers Need Registers
Paper • 2309.16588 • Published • 77 -
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper • 2309.16414 • Published • 19 -
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Paper • 2309.16534 • Published • 15