Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
RLHFlow
's Collections
RLHFlow MATH Process Reward Model
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
RLHFLow Reward Models
SFT Models
RLHFlow MATH Process Reward Model
updated
1 day ago
This is a collection of datasets and models of process reward modeling.
Upvote
2
RLHFlow/Mistral-PRM-Data
Viewer
•
Updated
2 days ago
•
273k
•
9
RLHFlow/Mistral-GSM8K-Test
Viewer
•
Updated
3 days ago
•
1.32k
•
11
RLHFlow/Mistral-MATH500-Test
Viewer
•
Updated
3 days ago
•
500
•
9
RLHFlow/Llama3.1-8B-PRM
Text Generation
•
Updated
3 days ago
•
10
RLHFlow/Deepseek-MATH500-Test
Viewer
•
Updated
3 days ago
•
500
•
2
RLHFlow/Mistral-ORM-Data
Viewer
•
Updated
2 days ago
•
15k
•
4
RLHFlow/Deepseek-ORM-Data
Viewer
•
Updated
2 days ago
•
15k
•
3
RLHFlow/Deepseek-GSM8K-Test
Viewer
•
Updated
1 day ago
•
1.32k
•
9
Upvote
2
Share collection
View history
Collection guide
Browse collections