Papers-Benchmarks - a sugatoray Collection

sugatoray 's Collections

Bookmark::Models

LLMs

AV LLMs

LLM Training Datasets

Papers

Leaderboards 🔥

Papers-Fundamentals

TFM: TimeSeries Foundation Models

Papers-Benchmarks

LLMs-EmbeddingModels

LLM + Datasets : Finance

Papers-Benchmarks

updated Oct 20

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12 • 15
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13 • 24
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26 • 32
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10 • 25