🦅 🐍 FalconMamba 7B Collection This collection features the FalconMamba 7B base model, the instruction-tuned version, their 4-bit and GGUF variants, and the demo. • 15 items • Updated 26 days ago • 27
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? Paper • 2407.16607 • Published Jul 23 • 21
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 602
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22 • 124
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis Paper • 2401.17093 • Published Jan 30 • 18
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding Paper • 2401.03003 • Published Jan 5 • 12
PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation Paper • 2312.17276 • Published Dec 27, 2023 • 15
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation Paper • 2312.14187 • Published Dec 20, 2023 • 49
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 40