Long context LLM - a Gmc2 Collection

Gmc2 's Collections

Long context LLM

Long context LLM

updated Jul 4

Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5
Ring Attention with Blockwise Transformers for Near-Infinite Context

Paper • 2310.01889 • Published Oct 3, 2023 • 10
Striped Attention: Faster Ring Attention for Causal Transformers

Paper • 2311.09431 • Published Nov 15, 2023 • 4
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 17
LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

Paper • 2310.03294 • Published Oct 5, 2023 • 2
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Paper • 2403.09347 • Published Mar 14 • 20
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

Paper • 2402.02244 • Published Feb 3 • 1
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

Paper • 2401.02669 • Published Jan 5 • 14
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

Paper • 2311.12351 • Published Nov 21, 2023 • 3
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Paper • 2401.01325 • Published Jan 2 • 26
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 111
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 103
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 63
Longformer: The Long-Document Transformer

Paper • 2004.05150 • Published Apr 10, 2020 • 3
Generating Long Sequences with Sparse Transformers

Paper • 1904.10509 • Published Apr 23, 2019 • 1
A Unified Sequence Parallelism Approach for Long Context Generative AI

Paper • 2405.07719 • Published May 13 • 2
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 65
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism

Paper • 2406.18485 • Published Jun 26 • 2