matlok
's Collections
Papers - MoE - Research
updated
Adaptive sequential Monte Carlo by means of mixture of experts
Paper
•
1108.2836
•
Published
•
2
Convergence Rates for Mixture-of-Experts
Paper
•
1110.2058
•
Published
•
2
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs
Paper
•
2310.12008
•
Published
•
2
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer
with Mixture-of-View-Experts
Paper
•
2308.11793
•
Published
•
2
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Paper
•
2308.10110
•
Published
•
2
HyperFormer: Enhancing Entity and Relation Interaction for
Hyper-Relational Knowledge Graph Completion
Paper
•
2308.06512
•
Published
•
2
Experts Weights Averaging: A New General Training Scheme for Vision
Transformers
Paper
•
2308.06093
•
Published
•
2
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language
Models
Paper
•
2403.03432
•
Published
•
1
Enhancing the "Immunity" of Mixture-of-Experts Networks for Adversarial
Defense
Paper
•
2402.18787
•
Published
•
2
Not All Experts are Equal: Efficient Expert Pruning and Skipping for
Mixture-of-Experts Large Language Models
Paper
•
2402.14800
•
Published
•
3
Multilinear Mixture of Experts: Scalable Expert Specialization through
Factorization
Paper
•
2402.12550
•
Published
•
2
Turn Waste into Worth: Rectifying Top-k Router of MoE
Paper
•
2402.12399
•
Published
•
2
Buffer Overflow in Mixture of Experts
Paper
•
2402.05526
•
Published
•
8
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Paper
•
2211.15841
•
Published
•
7
A Machine Learning Perspective on Predictive Coding with PAQ
Paper
•
1108.3298
•
Published
•
2
DEMix Layers: Disentangling Domains for Modular Language Modeling
Paper
•
2108.05036
•
Published
•
3
Sparse Backpropagation for MoE Training
Paper
•
2310.00811
•
Published
•
2
A Review of Sparse Expert Models in Deep Learning
Paper
•
2209.01667
•
Published
•
3
FedJETs: Efficient Just-In-Time Personalization with Federated Mixture
of Experts
Paper
•
2306.08586
•
Published
•
1
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with
Architecture-Routed Mixture-of-Experts
Paper
•
2306.04845
•
Published
•
4
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper
•
2403.07508
•
Published
•
75
Unified Scaling Laws for Routed Language Models
Paper
•
2202.01169
•
Published
•
2
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper
•
2310.16795
•
Published
•
26
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper
•
2212.05055
•
Published
•
5