Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.14994

Paper Reading List

xLSTM: Extended Long Short-Term Memory

Paper • 2405.04517 • Published May 7 • 11
You Only Cache Once: Decoder-Decoder Architectures for Language Models

Paper • 2405.05254 • Published May 8 • 9
Understanding the performance gap between online and offline alignment algorithms

Paper • 2405.08448 • Published May 14 • 14
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 126

Papers - Attention - Hard Attention

Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published Apr 23 • 18

Papers - Attention - Sparse Attention

Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published Apr 23 • 18

Papers - Attention - Multi-Head Attention (MHA)

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Paper • 2305.13571 • Published May 23, 2023 • 2
Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published Apr 23 • 18
Are Sixteen Heads Really Better than One?

Paper • 1905.10650 • Published May 25, 2019 • 2
Reasoning in Large Language Models: A Geometric Perspective

Paper • 2407.02678 • Published Jul 2 • 1

Papers - Custom Layers - MLP

MLP Can Be A Good Transformer Learner

Paper • 2404.05657 • Published Apr 8 • 1
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective

Paper • 2404.07200 • Published Apr 10 • 1
An inclusive review on deep learning techniques and their scope in handwriting recognition

Paper • 2404.08011 • Published Apr 10 • 1
Long-form music generation with latent diffusion

Paper • 2404.10301 • Published Apr 16 • 24

Papers - ETH Zurich

I-Design: Personalized LLM Interior Designer

Paper • 2404.02838 • Published Apr 3 • 2
Scaling MLPs: A Tale of Inductive Bias

Paper • 2306.13575 • Published Jun 23, 2023 • 14
Fast Feedforward Networks

Paper • 2308.14711 • Published Aug 28, 2023 • 2
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22 • 44

To read... eventually

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 124
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 50
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6 • 12
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65

Papers - Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16 • 78
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 19
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs