-
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Paper • 2305.11738 • Published • 5 -
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 2 -
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper • 2401.07382 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2401.07382
-
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Paper • 2305.11738 • Published • 5 -
Shepherd: A Critic for Language Model Generation
Paper • 2308.04592 • Published • 29 -
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Paper • 2402.14809 • Published • 2 -
DRLC: Reinforcement Learning with Dense Rewards from LLM Critic
Paper • 2401.07382 • Published • 2
-
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 57 -
WARM: On the Benefits of Weight Averaged Reward Models
Paper • 2401.12187 • Published • 17 -
RewardBench: Evaluating Reward Models for Language Modeling
Paper • 2403.13787 • Published • 21 -
DreamReward: Text-to-3D Generation with Human Preference
Paper • 2403.14613 • Published • 35
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 3 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62
-
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1 -
Specific versus General Principles for Constitutional AI
Paper • 2310.13798 • Published • 2 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 24 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 47