-
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Paper • 2405.07990 • Published • 16 -
Large Language Models as Planning Domain Generators
Paper • 2405.06650 • Published • 9 -
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation
Paper • 2404.12753 • Published • 41 -
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Paper • 2404.07972 • Published • 44
Collections
Discover the best community collections!
Collections including paper arxiv:2402.04141
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 143 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 27 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 20 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 64
-
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 24 -
Multi-line AI-assisted Code Authoring
Paper • 2402.04141 • Published • 9 -
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Paper • 2402.10524 • Published • 21 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 29
-
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Paper • 2310.18628 • Published • 7 -
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation
Paper • 2311.00272 • Published • 9 -
Magicoder: Source Code Is All You Need
Paper • 2312.02120 • Published • 79 -
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper • 2312.04474 • Published • 29