PWM: Policy Learning with Large World Models
Ignat Georgiev, Varun Giridhar, Nicklas Hansen, Animesh Garg
Project website Paper Models & Datasets
Overview
Instead of building world models into algorithms, we propose using large-scale multi-task world models as differentiable simulators for policy learning. When well-regularized, these models enable efficient policy learning with first-order gradient optimization. This allows PWM to learn to solve 80 tasks in < 10 minutes each without the need for expensive online planning.
Structure of repository
pwm
βββ dflex
β βββ data - data used for dflex world model pre-training
β βββ pretrained - already trained world models that can be used in dflex experiments
βββ multitask - pre-trained world models for multitask evaluation
βββ pedagogical - pre-trained world models for recreating pedagogical examples
βββ README.md