Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.05526

Papers - MoE - Deny an Expert

Buffer Overflow in Mixture of Experts

Paper • 2402.05526 • Published Feb 8 • 8

Papers - MoE - Adversary Queries

Buffer Overflow in Mixture of Experts

Paper • 2402.05526 • Published Feb 8 • 8

Papers - MoE - Malicious Queries

Buffer Overflow in Mixture of Experts

Paper • 2402.05526 • Published Feb 8 • 8

Papers - MoE - Router

Turn Waste into Worth: Rectifying Top-k Router of MoE

Paper • 2402.12399 • Published Feb 17 • 2
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition

Paper • 2402.02526 • Published Feb 4 • 3
Buffer Overflow in Mixture of Experts

Paper • 2402.05526 • Published Feb 8 • 8
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29 • 26

Papers - MoE - Training

Robust Mixture-of-Expert Training for Convolutional Neural Networks

Paper • 2308.10110 • Published Aug 19, 2023 • 2
Experts Weights Averaging: A New General Training Scheme for Vision Transformers

Paper • 2308.06093 • Published Aug 11, 2023 • 2
ConstitutionalExperts: Training a Mixture of Principle-based Prompts

Paper • 2403.04894 • Published Mar 7 • 2
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

Paper • 2403.03432 • Published Mar 6 • 1

Papers - MoE - Research

Adaptive sequential Monte Carlo by means of mixture of experts

Paper • 1108.2836 • Published Aug 14, 2011 • 2
Convergence Rates for Mixture-of-Experts

Paper • 1110.2058 • Published Oct 10, 2011 • 2
Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs

Paper • 2310.12008 • Published Oct 18, 2023 • 2
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts

Paper • 2308.11793 • Published Aug 22, 2023 • 2

Buffer Overflow in Mixture of Experts

Paper • 2402.05526 • Published Feb 8 • 8

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28 • 18
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 19
Linear Transformers are Versatile In-Context Learners

Paper • 2402.14180 • Published Feb 21 • 6

Transformers & MoE

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Paper • 2312.07987 • Published Dec 13, 2023 • 40
Interfacing Foundation Models' Embeddings

Paper • 2312.07532 • Published Dec 12, 2023 • 10
Point Transformer V3: Simpler, Faster, Stronger

Paper • 2312.10035 • Published Dec 15, 2023 • 17
TheBloke/quantum-v0.01-GPTQ

Text Generation • Updated Dec 18, 2023 • 16 • 2

Vulnerabilities

https://llm-attacks.org/

Scalable Extraction of Training Data from (Production) Language Models

Paper • 2311.17035 • Published Nov 28, 2023 • 4
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10 • 25
Exploiting Novel GPT-4 APIs

Paper • 2312.14302 • Published Dec 21, 2023 • 12
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published Apr 19 • 38

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs