MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning Paper • 2310.16049 • Published Oct 24, 2023 • 4
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments Paper • 2405.07960 • Published May 13 • 1
ART: Automatic multi-step reasoning and tool-use for large language models Paper • 2303.09014 • Published Mar 16, 2023 • 1
FuzzCoder: Byte-level Fuzzing Test via Large Language Model Paper • 2409.01944 • Published 16 days ago • 44
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing Paper • 2409.01322 • Published 17 days ago • 94
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper • 2409.02095 • Published 16 days ago • 32
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published 18 days ago • 26
LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models Paper • 2409.00509 • Published 19 days ago • 38
Leaderboards and benchmarks ✨ Collection Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 67 items • Updated Aug 6 • 83
SWE-bench-java: A GitHub Issue Resolving Benchmark for Java Paper • 2408.14354 • Published 24 days ago • 40
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Paper • 2408.12528 • Published 28 days ago • 50
Controllable Text Generation for Large Language Models: A Survey Paper • 2408.12599 • Published 28 days ago • 61
Data curation via joint example selection further accelerates multimodal learning Paper • 2406.17711 • Published Jun 25 • 3
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12 • 55
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12 • 52
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 114
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents Paper • 2408.07060 • Published Aug 13 • 39
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13 • 65
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Paper • 2408.03361 • Published Aug 6 • 85
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 152
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation Paper • 2408.00764 • Published Aug 1 • 1
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws Paper • 2404.05405 • Published Apr 8 • 7
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention Paper • 2407.19918 • Published Jul 29 • 47
Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification Paper • 2407.19340 • Published Jul 27 • 55
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning Paper • 2407.20798 • Published Jul 30 • 23
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Paper • 2407.21770 • Published Jul 31 • 20
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31 • 73
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement Paper • 2408.00653 • Published Aug 1 • 27
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 23
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Paper • 2408.00754 • Published Aug 1 • 21
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Paper • 2407.21787 • Published Jul 31 • 3
SHIC: Shape-Image Correspondences with no Keypoint Supervision Paper • 2407.18907 • Published Jul 26 • 38
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26 • 31
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26 • 30
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains Paper • 2407.18961 • Published Jul 18 • 38
Llama 3.1 Collection This collection hosts the transformers and original repos of the Meta Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Aug 2 • 569
Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation Paper • 2406.10970 • Published Jun 16 • 1
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore Paper • 2407.12854 • Published Jul 9 • 29
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18 • 52
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks Paper • 2305.17390 • Published May 27, 2023 • 2
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 169