Generating Benchmarks for Factuality Evaluation of Language Models Paper • 2307.06908 • Published Jul 13, 2023 • 7
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis Paper • 2307.12856 • Published Jul 24, 2023 • 35
WebArena: A Realistic Web Environment for Building Autonomous Agents Paper • 2307.13854 • Published Jul 25, 2023 • 23