Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
davidberenstein1957
's Collections
Synthetic Data Papers
Dataset Viber annotators
LLM evals and benchmark datasets
Useful Spaces
Cool and fun Spaces
Model Leaderboards
Useful models
Useful datasets
LLM evals and benchmark datasets
updated
Aug 17
Upvote
2
allenai/reward-bench
Viewer
•
Updated
Sep 9
•
8.11k
•
7.15k
•
75
openai/openai_humaneval
Viewer
•
Updated
Jan 4
•
164
•
155k
•
249
google/IFEval
Viewer
•
Updated
Aug 14
•
541
•
6.6k
•
35
allenai/ai2_arc
Viewer
•
Updated
Dec 21, 2023
•
7.79k
•
131k
•
144
allenai/winogrande
Updated
Jan 18
•
82.6k
•
57
TIGER-Lab/MMLU-Pro
Viewer
•
Updated
Oct 18
•
12.1k
•
29.9k
•
286
cais/mmlu
Viewer
•
Updated
Mar 8
•
231k
•
81.4k
•
325
truthfulqa/truthful_qa
Viewer
•
Updated
Jan 4
•
1.63k
•
29.9k
•
202
openai/gsm8k
Viewer
•
Updated
Jan 4
•
17.6k
•
214k
•
419
Rowan/hellaswag
Viewer
•
Updated
Sep 28, 2023
•
60k
•
103k
•
96
tatsu-lab/alpaca_eval
Updated
Aug 16
•
24.9k
•
50
HuggingFaceH4/mt_bench_prompts
Viewer
•
Updated
Jul 3, 2023
•
80
•
418
•
16
nvidia/ChatRAG-Bench
Viewer
•
Updated
May 24
•
34.6k
•
2.17k
•
100
rungalileo/ragbench
Viewer
•
Updated
Jun 11
•
95.4k
•
1.91k
•
16
Upvote
2
Share collection
View history
Collection guide
Browse collections