Qwen2.5-Math Collection Math-specific model series based on Qwen2.5 • 8 items • Updated 1 day ago • 19
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 6 items • Updated 11 minutes ago • 38
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 1 day ago • 120
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 2 days ago • 148
Llama3-8B-1.58 Collection A trio of powerful models: fine-tuned from Llama3-8b-Instruct, with BitNet architecture! • 3 items • Updated 5 days ago • 8
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published 6 days ago • 38
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning Paper • 2406.12050 • Published Jun 17 • 16
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 8 days ago • 53
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published 17 days ago • 72
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Paper • 2409.02889 • Published 16 days ago • 53
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published 16 days ago • 27
LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models Paper • 2409.00509 • Published 20 days ago • 38
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published 23 days ago • 81
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published 24 days ago • 137
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Paper • 2408.12528 • Published 29 days ago • 50
Controllable Text Generation for Large Language Models: A Survey Paper • 2408.12599 • Published 29 days ago • 61
Robust Speech Recognition via Large-Scale Weak Supervision Paper • 2212.04356 • Published Dec 6, 2022 • 17
Whisper Release Collection Whisper includes both English-only and multilingual checkpoints for ASR and ST, ranging from 38M params for the tiny models to 1.5B params for large. • 12 items • Updated Sep 13, 2023 • 74
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper • 2408.02442 • Published Aug 5 • 17
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models Paper • 1910.02054 • Published Oct 4, 2019 • 4
Qwen2-VL Collection Vision-language model series based on Qwen2 • 15 items • Updated 2 days ago • 114
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published about 1 month ago • 40
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering Paper • 2408.09174 • Published Aug 17 • 51
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents Paper • 2408.07199 • Published Aug 13 • 19
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published 29 days ago • 109
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published Aug 15 • 44
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 96
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning Paper • 2408.07089 • Published Aug 9 • 12
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13 • 65
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12 • 52
VITA: Towards Open-Source Interactive Omni Multimodal LLM Paper • 2408.05211 • Published Aug 9 • 46
CLAIR and APO Collection Data and Models for the paper "Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment" • 8 items • Updated Aug 14 • 3
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment Paper • 2408.06266 • Published Aug 12 • 9
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Paper • 2408.03361 • Published Aug 6 • 85
Jamba-1.5 Collection The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated 29 days ago • 71
view article Article Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging By akjindal53244 • Aug 19 • 72
mEdIT Collection Collection of the publicly available mEdIT dataset and instruction-tuned models for multilingual text revision. • 3 items • Updated May 17 • 2
CoEdIT: Text Editing by Task-Specific Instruction Tuning Paper • 2305.09857 • Published May 17, 2023 • 7
CoEdIT Collection Collection of the publicly available CoEdIT dataset and instruction-tuned models for text editing. • 6 items • Updated Apr 15 • 6
💻 Local SmolLMs Collection SmolLM models in MLC, ONNX and GGUF format for local applications + in-browser demos • 14 items • Updated about 1 month ago • 40
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search Paper • 2408.08152 • Published Aug 15 • 51