LLM Reasoning Papers Collection Papers to improve reasoning capabilities of LLMs • 15 items • Updated 4 days ago • 72
Gated Slot Attention for Efficient Linear-Time Sequence Modeling Paper • 2409.07146 • Published Sep 11 • 19
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14 • 48
view article Article Indexify: Bringing HuggingFace Models to Real-Time Pipelines for Production Applications By rishiraj • May 31 • 7
Blackhole Collection A black hole with lots of high-quality dialogue datasets in many fields, and multilingual helps to train LLMs with SFT and DPO methods easier. • 32 items • Updated Aug 18 • 6
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 118
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs Paper • 2403.20041 • Published Mar 29 • 34
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14 • 20
LocalMamba: Visual State Space Model with Windowed Selective Scan Paper • 2403.09338 • Published Mar 14 • 7
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12 • 217
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Paper • 2312.09390 • Published Dec 14, 2023 • 32
Prompt Cache: Modular Attention Reuse for Low-Latency Inference Paper • 2311.04934 • Published Nov 7, 2023 • 28
Safe RLHF: Safe Reinforcement Learning from Human Feedback Paper • 2310.12773 • Published Oct 19, 2023 • 28
OctoPack: Instruction Tuning Code Large Language Models Paper • 2308.07124 • Published Aug 14, 2023 • 28
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback Paper • 2307.15217 • Published Jul 27, 2023 • 36
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models Paper • 2307.09793 • Published Jul 19, 2023 • 46
Extending Context Window of Large Language Models via Positional Interpolation Paper • 2306.15595 • Published Jun 27, 2023 • 53