BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Paper • 2411.10640 • Published 6 days ago • 37
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 10 items • Updated about 7 hours ago • 172
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction Paper • 2410.21169 • Published 24 days ago • 29
OPT Collection OPT (Open Pretrained Transformer) is a series of open-sourced large causal language models which perform similar in performance to GPT3. • 12 items • Updated about 6 hours ago • 4
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Sep 18 • 372
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published Sep 12 • 66
Switch-Transformers release Collection This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts. • 9 items • Updated Jul 31 • 15
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention Aug 21 • 22
view article Article ∞🧙🏼♂️AnyClassifier - Generating Synthetic Data For Text Classification By kenhktsui • Aug 19 • 8
view article Article The case for specialized pre-training: ultra-fast foundation models for dedicated tasks By Pclanglais • Aug 4 • 26
📈 Scaling Laws with Vocabulary Collection Increase your vocabulary size when you scale up your language model • 5 items • Updated Aug 11 • 4