-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 144 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 12 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 51 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 45
Collections
Discover the best community collections!
Collections including paper arxiv:2408.15496
-
Large Language Model Unlearning via Embedding-Corrupted Prompts
Paper • 2406.07933 • Published • 7 -
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper • 2406.02657 • Published • 36 -
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Paper • 2406.12050 • Published • 18 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 30
-
StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
Paper • 2311.14495 • Published • 1 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 59 -
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation
Paper • 2401.13560 • Published • 1 -
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces
Paper • 2402.00789 • Published • 2
-
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 59 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 38 -
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation
Paper • 2401.13560 • Published • 1 -
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces
Paper • 2402.00789 • Published • 2
-
Trellis Networks for Sequence Modeling
Paper • 1810.06682 • Published • 1 -
Pruning Very Deep Neural Network Channels for Efficient Inference
Paper • 2211.08339 • Published • 1 -
LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch
Paper • 2309.14157 • Published • 1 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138
-
TRAMS: Training-free Memory Selection for Long-range Language Modeling
Paper • 2310.15494 • Published • 1 -
A Long Way to Go: Investigating Length Correlations in RLHF
Paper • 2310.03716 • Published • 9 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 65 -
Giraffe: Adventures in Expanding Context Lengths in LLMs
Paper • 2308.10882 • Published • 1