Papers I want to read - a osanseviero Collection

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23 • 35

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24 • 13

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31 • 63

Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

Paper • 2405.20541 • Published May 30 • 21

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Paper • 2406.04333 • Published Jun 6 • 36

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Paper • 2406.04314 • Published Jun 6 • 27

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Paper • 2406.04151 • Published Jun 6 • 17

Mixture-of-Agents Enhances Large Language Model Capabilities

Paper • 2406.04692 • Published Jun 7 • 55

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Paper • 2406.04770 • Published Jun 7 • 27

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10 • 65

Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published Jun 10 • 24

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published Jun 10 • 24

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Paper • 2406.05981 • Published Jun 10 • 12

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Paper • 2406.05370 • Published Jun 8 • 14

Unified Text-to-Image Generation and Retrieval

Paper • 2406.05814 • Published Jun 9 • 11

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11 • 55

Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

Paper • 2406.06563 • Published Jun 3 • 17

MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering

Paper • 2406.06573 • Published Jun 3 • 9

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

Paper • 2406.06282 • Published Jun 10 • 36

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Paper • 2406.05955 • Published Jun 10 • 22

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 65

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Paper • 2407.01906 • Published Jul 2 • 34

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3 • 92

No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Paper • 2407.02687 • Published Jul 2 • 22

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Paper • 2407.01370 • Published Jul 1 • 85

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Paper • 2407.02371 • Published Jul 2 • 49

Agentless: Demystifying LLM-based Software Engineering Agents

Paper • 2407.01489 • Published Jul 1 • 42

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2 • 23

Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2 • 21

Magic Insert: Style-Aware Drag-and-Drop

Paper • 2407.02489 • Published Jul 2 • 20

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1 • 75

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Paper • 2407.00468 • Published Jun 29 • 34

ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27 • 41

RegMix: Data Mixture as Regression for Language Model Pre-training

Paper • 2407.01492 • Published Jul 1 • 35

Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning

Paper • 2407.00782 • Published Jun 30 • 23

Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP

Paper • 2407.00402 • Published Jun 29 • 22

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28 • 95

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

Paper • 2406.19280 • Published Jun 27 • 60

Direct Preference Knowledge Distillation for Large Language Models

Paper • 2406.19774 • Published Jun 28 • 21

Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

Paper • 2406.18629 • Published Jun 26 • 41

Can LLMs Learn by Teaching? A Preliminary Study

Paper • 2406.14629 • Published Jun 20 • 18

Adam-mini: Use Fewer Learning Rates To Gain More

Paper • 2406.16793 • Published Jun 24 • 67

Octo-planner: On-device Language Model for Planner-Action Agents

Paper • 2406.18082 • Published Jun 26 • 47

ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation

Paper • 2406.18522 • Published Jun 26 • 40

A Closer Look into Mixture-of-Experts in Large Language Models

Paper • 2406.18219 • Published Jun 26 • 15

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25 • 86

LongIns: A Challenging Long-context Instruction-based Exam for LLMs

Paper • 2406.17588 • Published Jun 25 • 20

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

Paper • 2406.18518 • Published Jun 26 • 23

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24 • 57

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Paper • 2406.15877 • Published Jun 22 • 45

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24 • 32

WARP: On the Benefits of Weight Averaged Rewarded Policies

Paper • 2406.16768 • Published Jun 24 • 22

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Paper • 2407.04842 • Published Jul 5 • 52

LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages

Paper • 2407.05975 • Published Jul 8 • 34

ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

Paper • 2407.06135 • Published Jul 8 • 20

Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction

Paper • 2407.03651 • Published Jul 4 • 15

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 82

AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3 • 48

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

Paper • 2407.07061 • Published Jul 9 • 26

BM25S: Orders of magnitude faster lexical search via eager sparse scoring

Paper • 2407.03618 • Published Jul 4 • 11

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 67

On Leakage of Code Generation Evaluation Datasets

Paper • 2407.07565 • Published Jul 10 • 5

Mixture of A Million Experts

Paper • 2407.04153 • Published Jul 4 • 4

H2O-Danube3 Technical Report

Paper • 2407.09276 • Published Jul 12 • 18

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

Paper • 2407.08348 • Published Jul 11 • 50

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9 • 41

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Paper • 2407.08296 • Published Jul 11 • 31

Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist

Paper • 2407.08733 • Published Jul 11 • 20

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10 • 27

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Paper • 2407.08303 • Published Jul 11 • 17

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Paper • 2407.09025 • Published Jul 12 • 128

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Paper • 2407.08770 • Published Jul 11 • 19

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15 • 156

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Paper • 2407.10457 • Published Jul 15 • 22

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Paper • 2407.11963 • Published Jul 16 • 43

Scaling Diffusion Transformers to 16 Billion Parameters

Paper • 2407.11633 • Published Jul 16 • 25

Qwen2-Audio Technical Report

Paper • 2407.10759 • Published Jul 15 • 55

Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

Paper • 2407.12327 • Published Jul 17 • 77

GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression

Paper • 2407.12077 • Published Jul 16 • 54

E5-V: Universal Embeddings with Multimodal Large Language Models

Paper • 2407.12580 • Published Jul 17 • 39

BOND: Aligning LLMs with Best-of-N Distillation

Paper • 2407.14622 • Published Jul 19 • 18

Compact Language Models via Pruning and Knowledge Distillation

Paper • 2407.14679 • Published Jul 19 • 38

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Paper • 2407.21770 • Published Jul 31 • 22

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 108

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31 • 73

OmniParser for Pure Vision Based GUI Agent

Paper • 2408.00203 • Published Aug 1 • 23

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

Paper • 2408.00765 • Published Aug 1 • 12

Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning

Paper • 2408.00690 • Published Aug 1 • 22

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 107

TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods

Paper • 2407.21630 • Published Jul 31 • 8

ShieldGemma: Generative AI Content Moderation Based on Gemma

Paper • 2407.21772 • Published Jul 31 • 13

Meltemi: The first open Large Language Model for Greek

Paper • 2407.20743 • Published Jul 30 • 67

A Large Encoder-Decoder Family of Foundation Models For Chemical Language

Paper • 2407.20267 • Published Jul 24 • 31

ThinK: Thinner Key Cache by Query-Driven Pruning

Paper • 2407.21018 • Published Jul 30 • 30

Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework

Paper • 2407.20729 • Published Jul 30 • 25

JaColBERTv2.5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources

Paper • 2407.20750 • Published Jul 30 • 21

SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain

Paper • 2407.19584 • Published Jul 28 • 62

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29 • 55

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Paper • 2407.20179 • Published Jul 29 • 46

MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains

Paper • 2407.18961 • Published Jul 18 • 39

Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29 • 35

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Paper • 2407.18248 • Published Jul 25 • 31

AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26 • 32

LAMBDA: A Large Model Based Data Agent

Paper • 2407.17535 • Published Jul 24 • 34

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents

Paper • 2407.17490 • Published Jul 3 • 30

Very Large-Scale Multi-Agent Simulation in AgentScope

Paper • 2407.17789 • Published Jul 25 • 31

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

Paper • 2407.16607 • Published Jul 23 • 22

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Paper • 2407.16741 • Published Jul 23 • 68

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Paper • 2407.16154 • Published Jul 23 • 21

Longhorn: State Space Models are Amortized Online Learners

Paper • 2407.14207 • Published Jul 19 • 17

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

Paper • 2407.17470 • Published Jul 24 • 14

CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis

Paper • 2407.13301 • Published Jul 18 • 54

KAN or MLP: A Fairer Comparison

Paper • 2407.16674 • Published Jul 23 • 42

NNsight and NDIF: Democratizing Access to Foundation Model Internals

Paper • 2407.14561 • Published Jul 18 • 34

Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 23

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Paper • 2407.13623 • Published Jul 18 • 52

Scaling Granite Code Models to 128K Context

Paper • 2407.13739 • Published Jul 18 • 19

Understanding Reference Policies in Direct Preference Optimization

Paper • 2407.13709 • Published Jul 18 • 16

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Paper • 2404.00456 • Published Mar 30 • 4

Accuracy is Not All You Need

Paper • 2407.09141 • Published Jul 12 • 1

Fast Inference from Transformers via Speculative Decoding

Paper • 2211.17192 • Published Nov 30, 2022 • 4

LLM Augmented LLMs: Expanding Capabilities through Composition

Paper • 2401.02412 • Published Jan 4 • 36

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Paper • 2407.19594 • Published Jul 28 • 20

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Paper • 2407.20183 • Published Jul 29 • 38

Improving Retrieval Augmented Language Model with Self-Reasoning

Paper • 2407.19813 • Published Jul 29 • 6

Adaptive Retrieval-Augmented Generation for Conversational Systems

Paper • 2407.21712 • Published Jul 31 • 2

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

Paper • 2408.02545 • Published Aug 5 • 33

Self-Taught Evaluators

Paper • 2408.02666 • Published Aug 5 • 26

A Survey of Mamba

Paper • 2408.01129 • Published Aug 2 • 1

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59

Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks

Paper • 2408.03615 • Published Aug 7 • 30

EXAONE 3.0 7.8B Instruction Tuned Language Model

Paper • 2408.03541 • Published Aug 7 • 34

WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Paper • 2408.03837 • Published Aug 7 • 17

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8 • 155

LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Paper • 2408.04284 • Published Aug 8 • 22

Better Alignment with Instruction Back-and-Forth Translation

Paper • 2408.04614 • Published Aug 8 • 14

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9 • 46

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Paper • 2408.05147 • Published Aug 9 • 37

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Paper • 2408.04840 • Published Aug 9 • 31

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12 • 115

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

Paper • 2408.06070 • Published Aug 12 • 52

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12 • 35

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Paper • 2408.06327 • Published Aug 12 • 15

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Paper • 2408.07055 • Published Aug 13 • 65

OpenResearcher: Unleashing AI for Accelerated Scientific Research

Paper • 2408.06941 • Published Aug 13 • 30

Layerwise Recurrent Router for Mixture-of-Experts

Paper • 2408.06793 • Published Aug 13 • 30

I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

Paper • 2408.08072 • Published Aug 15 • 32

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Paper • 2408.08152 • Published Aug 15 • 52

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Paper • 2408.07199 • Published Aug 13 • 20

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21 • 55

JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

Paper • 2408.08459 • Published Aug 15 • 44

Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15 • 38

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20 • 41

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20 • 56

FocusLLM: Scaling LLM's Context by Parallel Decoding

Paper • 2408.11745 • Published Aug 21 • 23

Hermes 3 Technical Report

Paper • 2408.11857 • Published Aug 15 • 38

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22 • 30

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22 • 62

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 118

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27 • 37

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Paper • 2408.15664 • Published Aug 28 • 11

A Web-Based Solution for Federated Learning with LLM-Based Automation

Paper • 2408.13010 • Published Aug 23 • 9

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Paper • 2408.14176 • Published Aug 26 • 60

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

Paper • 2408.13467 • Published Aug 24 • 24

MobileQuant: Mobile-friendly Quantization for On-device Language Models

Paper • 2408.13933 • Published Aug 25 • 13

LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!

Paper • 2408.13402 • Published Aug 23 • 18

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27 • 138

BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

Paper • 2408.15079 • Published Aug 27 • 52

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28 • 42

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28 • 83

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published Aug 29 • 92

CogVLM2: Visual Language Models for Image and Video Understanding

Paper • 2408.16500 • Published Aug 29 • 56

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3 • 77

Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

Paper • 2409.03271 • Published Sep 5 • 2

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Paper • 2409.02897 • Published Sep 4 • 44

MemLong: Memory-Augmented Retrieval for Long Text Modeling

Paper • 2408.16967 • Published Aug 30 • 2

Large Language Model-Based Agents for Software Engineering: A Survey

Paper • 2409.02977 • Published Sep 4

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Paper • 2408.16768 • Published Aug 29 • 26

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Paper • 2408.16293 • Published Aug 29 • 25

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28 • 34

InkubaLM: A small language model for low-resource African languages

Paper • 2408.17024 • Published Aug 30 • 12

LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

Paper • 2409.00509 • Published Aug 31 • 38

FLUX that Plays Music

Paper • 2409.00587 • Published Sep 1 • 31

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4 • 28

Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining

Paper • 2409.02326 • Published Sep 3 • 18

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5 • 88

FuzzCoder: Byte-level Fuzzing Test via Large Language Model

Paper • 2409.01944 • Published Sep 3 • 44

Building Math Agents with Multi-Turn Iterative Preference Learning

Paper • 2409.02392 • Published Sep 4 • 14

How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data

Paper • 2409.03810 • Published Sep 5 • 30

Configurable Foundation Models: Building LLMs from a Modular Perspective

Paper • 2409.02877 • Published Sep 4 • 27

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4 • 72

Agent Workflow Memory

Paper • 2409.07429 • Published Sep 11 • 27

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Paper • 2409.07703 • Published Sep 12 • 66

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Paper • 2409.08239 • Published Sep 12 • 16

Kolmogorov-Arnold Transformer

Paper • 2409.10594 • Published Sep 16 • 38

AudioBERT: Audio Knowledge Augmented Language Model

Paper • 2409.08199 • Published Sep 12 • 4

Policy Filtration in RLHF to Fine-Tune LLM for Code Generation

Paper • 2409.06957 • Published Sep 11 • 5

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17 • 108

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17 • 72

A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B

Paper • 2409.11055 • Published Sep 17 • 16

On the limits of agency in agent-based models

Paper • 2409.10568 • Published Sep 14 • 12

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 136

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18 • 74

A Controlled Study on Long Context Extension and Generalization in LLMs

Paper • 2409.12181 • Published Sep 18 • 43

LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published Sep 18 • 30

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published Sep 18 • 36

GRIN: GRadient-INformed MoE

Paper • 2409.12136 • Published Sep 18 • 15

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19 • 135

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19 • 47

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Paper • 2409.12961 • Published Sep 19 • 24

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

Paper • 2409.12957 • Published Sep 19 • 18

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Paper • 2409.12959 • Published Sep 19 • 36

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

Paper • 2409.12903 • Published Sep 19 • 21

Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published Sep 20 • 37

RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning

Paper • 2409.14674 • Published Sep 23 • 41

Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking

Paper • 2409.15268 • Published Sep 23 • 12

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published Sep 24 • 41

OmniBench: Towards The Future of Universal Omni-Language Models

Paper • 2409.15272 • Published Sep 23 • 25

Making Text Embedders Few-Shot Learners

Paper • 2409.15700 • Published Sep 24 • 29

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25 • 103

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness

Paper • 2409.18125 • Published Sep 26 • 33

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Paper • 2409.20566 • Published Sep 30 • 52

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Paper • 2409.16040 • Published Sep 24 • 13

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Paper • 2409.18042 • Published Sep 26 • 36

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27 • 91

A Survey on the Honesty of Large Language Models

Paper • 2409.18786 • Published Sep 27 • 31

Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

Paper • 2409.18943 • Published Sep 27 • 27

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1 • 29

LLaVA-Critic: Learning to Evaluate Multimodal Models

Paper • 2410.02712 • Published Oct 3 • 34

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3 • 37

Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis

Paper • 2409.20059 • Published Sep 30 • 15

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Paper • 2410.01215 • Published Oct 2 • 30

Not All LLM Reasoners Are Created Equal

Paper • 2410.01748 • Published Oct 2 • 27

WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Paper • 2207.01206 • Published Jul 4, 2022 • 2

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8 • 107

Upcycling Large Language Models into Mixture of Experts

Paper • 2410.07524 • Published Oct 10 • 3

SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published 21 days ago • 20

Stealing User Prompts from Mixture of Experts

Paper • 2410.22884 • Published 22 days ago • 13

Scaling Laws for Precision

Paper • 2411.04330 • Published 15 days ago • 6

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published 14 days ago • 48