ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities Paper • 2408.04682 • Published Aug 8 • 14