LLM
Multimodal LLM
- Paper • 2309.04662 • Published • 22
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 16Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 9DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 8FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper • 2309.03852 • Published • 43Large Language Models as Optimizers
Paper • 2309.03409 • Published • 75GPT Can Solve Mathematical Problems Without a Calculator
Paper • 2309.03241 • Published • 17DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Paper • 2309.03883 • Published • 33XGen-7B Technical Report
Paper • 2309.03450 • Published • 8Language Modeling Is Compression
Paper • 2309.10668 • Published • 82Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Paper • 2309.10020 • Published • 40Baichuan 2: Open Large-scale Language Models
Paper • 2309.10305 • Published • 19SlimPajama-DC: Understanding Data Combinations for LLM Training
Paper • 2309.10818 • Published • 10Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 9Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Paper • 2309.10150 • Published • 24FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 8The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Paper • 2309.11197 • Published • 4DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper • 2309.11499 • Published • 58AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Paper • 2309.16058 • Published • 55Effective Long-Context Scaling of Foundation Models
Paper • 2309.16039 • Published • 30Qwen Technical Report
Paper • 2309.16609 • Published • 34AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper • 2309.16414 • Published • 19MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Paper • 2309.16534 • Published • 15ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
Paper • 2309.16650 • Published • 10GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
Paper • 2309.16583 • Published • 13Language models in molecular discovery
Paper • 2309.16235 • Published • 10Toward Joint Language Modeling for Speech Units and Text
Paper • 2310.08715 • Published • 7Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 74MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 24BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96VeRA: Vector-based Random Matrix Adaptation
Paper • 2310.11454 • Published • 28Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper • 2310.11441 • Published • 26Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 16EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Paper • 2310.11440 • Published • 15TEQ: Trainable Equivalent Transformation for Quantization of LLMs
Paper • 2310.10944 • Published • 9Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper • 2310.10837 • Published • 10LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
Paper • 2310.10769 • Published • 8Video Language Planning
Paper • 2310.10625 • Published • 9MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Paper • 2310.09478 • Published • 19GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors
Paper • 2310.08529 • Published • 17UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 19DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
Paper • 2310.13119 • Published • 11H2O Open Ecosystem for State-of-the-art Large Language Models
Paper • 2310.13012 • Published • 7Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping
Paper • 2310.12474 • Published • 5DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design
Paper • 2310.15144 • Published • 13FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling
Paper • 2310.15169 • Published • 9Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 18Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper • 2310.13127 • Published • 11Teaching Language Models to Self-Improve through Interactive Demonstrations
Paper • 2310.13522 • Published • 11SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 6ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 12A Survey of Large Language Models
Paper • 2303.18223 • Published • 14A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Paper • 2310.05694 • Published • 3Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 14InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Paper • 2308.12067 • Published • 4Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory
Paper • 2305.17144 • Published • 2Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Paper • 2305.10601 • Published • 10UI Layout Generation with LLMs Guided by UI Grammar
Paper • 2310.15455 • Published • 2An Early Evaluation of GPT-4V(ision)
Paper • 2310.16534 • Published • 21Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Paper • 2310.15008 • Published • 21JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 33Controlled Decoding from Language Models
Paper • 2310.17022 • Published • 14HyperFields: Towards Zero-Shot Generation of NeRFs from Text
Paper • 2310.17075 • Published • 14Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper • 2310.17157 • Published • 11Can Language Models Understand Physical Concepts?
Paper • 2305.14057 • Published • 1BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Paper • 2201.12086 • Published • 3ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
Paper • 2305.15028 • Published • 1Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
Paper • 2305.14160 • Published • 1GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems
Paper • 2310.12397 • Published • 1Can Large Language Models Really Improve by Self-critiquing Their Own Plans?
Paper • 2310.08118 • Published • 1Voyager: An Open-Ended Embodied Agent with Large Language Models
Paper • 2305.16291 • Published • 9TiC-CLIP: Continual Training of CLIP Models
Paper • 2310.16226 • Published • 8CLEX: Continuous Length Extrapolation for Large Language Models
Paper • 2310.16450 • Published • 9ControlLLM: Augment Language Models with Tools by Searching on Graphs
Paper • 2310.17796 • Published • 16Data-Centric Financial Large Language Models
Paper • 2310.17784 • Published • 14FP8-LM: Training FP8 Large Language Models
Paper • 2310.18313 • Published • 31Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
Paper • 2310.19061 • Published • 8LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery
Paper • 2310.18356 • Published • 22MM-VID: Advancing Video Understanding with GPT-4V(ision)
Paper • 2310.19773 • Published • 19TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
Paper • 2310.19019 • Published • 9Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Paper • 2310.19102 • Published • 10Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Paper • 2310.18628 • Published • 7Skywork: A More Open Bilingual Foundation Model
Paper • 2310.19341 • Published • 5The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 9Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Paper • 2310.19909 • Published • 20Learning From Mistakes Makes LLM Better Reasoner
Paper • 2310.20689 • Published • 28Does GPT-4 Pass the Turing Test?
Paper • 2310.20216 • Published • 17LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Paper • 2310.20624 • Published • 12Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 16Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
Paper • 2310.20499 • Published • 7ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Paper • 2311.00272 • Published • 9ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper • 2311.00176 • Published • 8LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 40Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper • 2311.00430 • Published • 57Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?
Paper • 2311.00047 • Published • 8The Generative AI Paradox: "What It Can Create, It May Not Understand"
Paper • 2311.00059 • Published • 18AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning
Paper • 2311.00257 • Published • 8Text Rendering Strategies for Pixel Language Models
Paper • 2311.00522 • Published • 10FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 35FLAP: Fast Language-Audio Pre-training
Paper • 2311.01615 • Published • 16PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion
Paper • 2311.01767 • Published • 18Contrastive Chain-of-Thought Prompting
Paper • 2311.09277 • Published • 34Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying
Paper • 2311.09578 • Published • 14Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives
Paper • 2311.09227 • Published • 6The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Paper • 2312.01552 • Published • 30Generating Illustrated Instructions
Paper • 2312.04552 • Published • 7Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper • 2312.04474 • Published • 29Beyond Surface: Probing LLaMA Across Scales and Layers
Paper • 2312.04333 • Published • 18Large Language Models for Mathematicians
Paper • 2312.04556 • Published • 11AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper • 2312.03793 • Published • 17OneLLM: One Framework to Align All Modalities with Language
Paper • 2312.03700 • Published • 20Generative Multimodal Models are In-Context Learners
Paper • 2312.13286 • Published • 34Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 45StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Paper • 2402.01391 • Published • 41PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models
Paper • 2402.01118 • Published • 29K-Level Reasoning with Large Language Models
Paper • 2402.01521 • Published • 17TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Paper • 2402.01622 • Published • 33MusicRL: Aligning Music Generation to Human Preferences
Paper • 2402.04229 • Published • 16TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 89Computing Power and the Governance of Artificial Intelligence
Paper • 2402.08797 • Published • 11Premise Order Matters in Reasoning with Large Language Models
Paper • 2402.08939 • Published • 25L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects
Paper • 2402.09052 • Published • 16Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Paper • 2402.10466 • Published • 16RLVF: Learning from Verbal Feedback without Overgeneralization
Paper • 2402.10893 • Published • 10Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Paper • 2402.11450 • Published • 20AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper • 2402.12226 • Published • 40FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Paper • 2402.10986 • Published • 76LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 111OmniPred: Language Models as Universal Regressors
Paper • 2402.14547 • Published • 11Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Paper • 2402.14797 • Published • 19Watermarking Makes Language Models Radioactive
Paper • 2402.14904 • Published • 23ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 56MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 126Towards Optimal Learning of Language Models
Paper • 2402.17759 • Published • 16The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49Examining Forgetting in Continual Pre-training of Aligned Large Language Models
Paper • 2401.03129 • PublishedCan Large Language Models Be an Alternative to Human Evaluations?
Paper • 2305.01937 • Published • 2A Closer Look into Automatic Evaluation Using Large Language Models
Paper • 2310.05657 • PublishedLarge Language Models Understand and Can be Enhanced by Emotional Stimuli
Paper • 2307.11760 • Published • 1Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
Paper • 2312.16171 • Published • 34Re3: Generating Longer Stories With Recursive Reprompting and Revision
Paper • 2210.06774 • Published • 2Constitutional AI: Harmlessness from AI Feedback
Paper • 2212.08073 • Published • 2AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
Paper • 2402.04253 • PublishedELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper • 2403.05135 • Published • 42Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Paper • 2305.19118 • PublishedCAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society
Paper • 2303.17760 • Published • 1Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
Paper • 2310.02170 • Published • 2MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
Paper • 2308.00352 • Published • 2Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 84Large Language Models for Autonomous Driving: Real-World Experiments
Paper • 2312.09397 • PublishedThe Rise and Potential of Large Language Model Based Agents: A Survey
Paper • 2309.07864 • Published • 7
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Paper • 2406.08973 • Published • 86Note XLand-100B 是一個大規模、多任務的數據集,專為在上下文強化學習 (ICL) 設計,包含30,000個不同任務的完整學習歷史。該數據集涵蓋1000億次轉換和25億個回合,旨在消除ICL研究中的數據瓶頸,提供挑戰性的基準,促進ICL方法的比較和評估,並推動該領域的進一步研究。
Make It Count: Text-to-Image Generation with an Accurate Number of Objects
Paper • 2406.10210 • Published • 76Note CountGen方法識別擴散模型中的物體身份特徵,分離和計數物體實例,並修正過多或過少生成的情況,無需外部佈局來源。實驗結果顯示,CountGen在計數準確度上顯著優於現有基線方法,為文字生成圖像模型的進一步發展提供了重要資源。
Needle In A Multimodal Haystack
Paper • 2406.07230 • Published • 52Note Needle In A Multimodal Haystack (MM-NIAH)是首個系統評估MLLMs理解長多模態文檔能力的基準,涵蓋多模態檢索、計數和推理任務。實驗顯示,現有模型在這些任務上仍有改進空間。此基準為進一步研究和發展提供了重要平台。
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Paper • 2406.09961 • Published • 54Note ChartMimic基準評估大規模多模態模型在圖表到代碼生成中的能力,包含1,000個三元組,涵蓋多種圖表類型和191個子類別。實驗顯示,先進模型如GPT-4V僅獲得73.2分,表明跨模態推理能力仍有改進空間,促進人工通用智能的研究。
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Paper • 2406.10149 • Published • 48Note BABILong基準測試模型在長文檔推理中的能力,包含20個多樣推理任務。實驗顯示,當前模型在複雜推理中性能下降,僅能利用部分上下文,揭示長上下文處理的挑戰。再生記憶變壓器在處理超長文檔中表現最佳。
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Paper • 2406.08418 • Published • 28Note OmniCorpus是一個包含10億級圖像與文本交錯的大規模多模態數據集,來自多種來源。相比現有數據集,OmniCorpus規模大15倍,且具高靈活性和多樣性,驗證其質量和有效性,促進多模態模型研究。 OmniCorpus為未來的多模態模型研究提供了堅實的數據基礎,並在多模態上下文學習中顯示出顯著的潛力。研究者希望通過此數據集推動多模態大型語言模型的進一步發展。該數據集和代碼已在GitHub上釋出。
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Paper • 2406.10118 • Published • 30Note SEACrowd通過集中和標準化東南亞語言的多模態數據資源,填補了現有AI資源的空白,並通過基準測試提供了對AI模型在這些語言上性能的洞察。此外,研究揭示了現有LLMs在東南亞語言生成質量上的不足,並提出了未來發展的策略,以促進該地區AI技術的進步和資源公平性。
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering
Paper • 2406.10208 • Published • 21Note Glyph-ByT5-v2 和 Glyph-SDXL-v2 通過創建豐富的多語言資料集和基準測試,並應用先進的偏好學習方法,顯著提升了多語言視覺文字渲染的準確性和美學品質,成為該領域的一大突破。
GEB-1.3B: Open Lightweight Large Language Model
Paper • 2406.09900 • Published • 20Note GEB-1.3B的推出作為開源模型,標誌著輕量級LLMs發展中的一個重要里程碑,為進一步的研究和創新提供了良好的基礎。該模型在保持高性能的同時,顯著減少了計算資源需求,並提升了在CPU上的推理速度,為LLMs在更多應用場景中的部署提供了可能。
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
Paper • 2406.08845 • Published • 8Note 本論文提出的T2VHE協議大大提高了T2V模型評估的可靠性、重現性和實用性,並承諾開源所有評估流程和代碼,以促進社群內的模型評估和改進。
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Paper • 2406.10227 • Published • 9Note VideoGUI提供了一個新的多模態基準,專注於評估視覺為中心的GUI任務。研究表明,即使是最先進的模型也在這些任務上存在挑戰,特別是在高層次規劃方面,突顯了進一步研究和改進的必要性。
Designing a Dashboard for Transparency and Control of Conversational AI
Paper • 2406.07882 • Published • 9Note 本研究展示了如何通過儀表板界面提升對話式AI系統的透明度和控制力。未來工作將重點放在進一步優化設計,並深入研究用戶對偏見和隱私的反應。 參考資料 此研究相關的項目頁面和視頻演示可在 TalkTuner Project Page 查看。
MaskLID: Code-Switching Language Identification through Iterative Masking
Paper • 2406.06263 • Published • 5Note MaskLID方法通過屏蔽主要語言特徵,有效改善了CS場景下的語言識別,特別在多語言混合的句子中表現出色。該方法不僅提高了識別精度,且適用範圍廣泛,能處理大量網絡數據,對未來的自然語言處理應用有重要意義。
Decoding the Diversity: A Review of the Indic AI Research Landscape
Paper • 2406.09559 • Published • 5Note 這篇論文提供了印度語言AI研究的一個全面概覽,對研究方向進行了系統分類,並強調了現有的挑戰和未來的研究方向。通過詳細的分析和分類,該研究為從事印度語言NLP的研究者和實踐者提供了寶貴的資源,助力於更準確高效的LLM應用於這些語言。
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
Paper • 2406.10209 • Published • 8Note 本研究成功提出了一種新方法(即金魚損失),有效減少了大型語言模型的記憶行為,同時保持模型的整體性能。這為解決模型記憶帶來的隱私和版權風險提供了一條可行的解決途徑。
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Paper • 2406.07522 • Published • 37Note Samba 提出了一種結合 SSM 和注意力機制的簡單混合架構,實現了在無限上下文長度序列建模中的高效性和精確性,並在多項基準測試中超越了現有的最先進模型。 這篇論文展示了 Samba 在處理長序列上下文方面的潛力,並且提供了實際實施範例,可在 GitHub 上找到。
CRAG -- Comprehensive RAG Benchmark
Paper • 2406.04744 • Published • 42Note CRAG為檢索增強生成技術和一般問答解決方案的研究提供了一個豐富而多樣的基準,揭示了目前RAG技術在面對現實世界多樣性和動態性問題時的挑戰和未來的研究方向。這一基準已經成為KDD Cup 2024挑戰的一部分,並將持續支持相關研究社群的進步。
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Paper • 2406.11768 • Published • 20Note GAMA 模型通過整合多種類型的音頻表示和進行合成指令調教,顯著提升了音頻理解和複雜推理能力,並在各種音頻理解任務中取得了領先的性能表現。這項研究展示了 GAMA 在音頻-語言模型領域的潛力,為未來的研究和應用奠定了基礎。
TroL: Traversal of Layers for Large Language and Vision Models
Paper • 2406.12246 • Published • 34DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 57Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Paper • 2406.12624 • Published • 36LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Paper • 2406.15319 • Published • 61Towards Retrieval Augmented Generation over Large Video Libraries
Paper • 2406.14938 • Published • 19Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models
Paper • 2406.14599 • Published • 16Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework
Paper • 2406.14783 • Published • 16Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
Paper • 2406.15193 • Published • 12Jailbreaking as a Reward Misspecification Problem
Paper • 2406.14393 • Published • 12DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Paper • 2406.16855 • Published • 54BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Paper • 2406.15877 • Published • 45Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Paper • 2406.16860 • Published • 57Scaling Laws for Linear Complexity Language Models
Paper • 2406.16690 • Published • 22Efficient Continual Pre-training by Mitigating the Stability Gap
Paper • 2406.14833 • Published • 19Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
Paper • 2406.16758 • Published • 19Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models
Paper • 2406.15718 • Published • 14Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
Paper • 2406.15927 • Published • 13Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Paper • 2406.16235 • Published • 11Confidence Regulation Neurons in Language Models
Paper • 2406.16254 • Published • 10How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Paper • 2406.14051 • Published • 9Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations
Paper • 2406.13632 • Published • 5IRASim: Learning Interactive Real-Robot Action Simulators
Paper • 2406.14540 • Published • 6Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Paper • 2406.16008 • Published • 6We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Paper • 2407.01284 • Published • 75Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Paper • 2407.01906 • Published • 34PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Paper • 2407.02869 • Published • 18FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Paper • 2407.04051 • Published • 35GRUtopia: Dream General Robots in a City at Scale
Paper • 2407.10943 • Published • 23From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Paper • 2404.16130 • Published • 4Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study
Paper • 2406.07057 • Published • 15Internal Consistency and Self-Feedback in Large Language Models: A Survey
Paper • 2407.14507 • Published • 46Language Models (Mostly) Know What They Know
Paper • 2207.05221 • Published • 1VideoGameBunny: Towards vision assistants for video games
Paper • 2407.15295 • Published • 21Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Paper • 2407.16607 • Published • 22Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Paper • 2407.18121 • Published • 16Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
Paper • 2407.18129 • Published • 11The FIGNEWS Shared Task on News Media Narratives
Paper • 2407.18147 • Published • 8SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain
Paper • 2407.19584 • Published • 62AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Paper • 2407.18901 • Published • 32Sentiment Analysis of Lithuanian Online Reviews Using Large Language Models
Paper • 2407.19914 • Published • 12MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains
Paper • 2407.18961 • Published • 39Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification
Paper • 2407.19340 • Published • 57SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages
Paper • 2407.19672 • Published • 55MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
Paper • 2407.20183 • Published • 38Meltemi: The first open Large Language Model for Greek
Paper • 2407.20743 • Published • 67Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework
Paper • 2407.20729 • Published • 25Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings
Paper • 2407.20581 • Published • 23A Large Encoder-Decoder Family of Foundation Models For Chemical Language
Paper • 2407.20267 • Published • 31JaColBERTv2.5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources
Paper • 2407.20750 • Published • 21The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 107Finch: Prompt-guided Key-Value Cache Compression
Paper • 2408.00167 • Published • 13MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
Paper • 2408.01337 • Published • 10RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation
Paper • 2408.02545 • Published • 33Language Model Can Listen While Speaking
Paper • 2408.02622 • Published • 37mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Paper • 2408.04840 • Published • 31Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Paper • 2408.05147 • Published • 37Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 61Med42-v2: A Suite of Clinical LLMs
Paper • 2408.06142 • Published • 50Imagen 3
Paper • 2408.07009 • Published • 61OpenResearcher: Unleashing AI for Accelerated Scientific Research
Paper • 2408.06941 • Published • 30Falcon2-11B Technical Report
Paper • 2407.14885 • PublishedGenerative Photomontage
Paper • 2408.07116 • Published • 19Aquila2 Technical Report
Paper • 2408.07410 • Published • 13JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
Paper • 2408.08459 • Published • 44Can Large Language Models Understand Symbolic Graphics Programs?
Paper • 2408.08313 • Published • 6T3M: Text Guided 3D Human Motion Synthesis from Speech
Paper • 2408.12885 • Published • 9Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale
Paper • 2409.17115 • Published • 59