-
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 24 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 47 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 47 -
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Paper • 2312.00849 • Published • 8
Massimiliano Pappa
MaxPappa
AI & ML interests
None yet
Organizations
Collections
3
models
None public yet
datasets
None public yet