[2024-10-06 19:05:31,147][00269] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-10-06 19:05:31,150][00269] Rollout worker 0 uses device cpu [2024-10-06 19:05:31,152][00269] Rollout worker 1 uses device cpu [2024-10-06 19:05:31,153][00269] Rollout worker 2 uses device cpu [2024-10-06 19:05:31,155][00269] Rollout worker 3 uses device cpu [2024-10-06 19:05:31,156][00269] Rollout worker 4 uses device cpu [2024-10-06 19:05:31,157][00269] Rollout worker 5 uses device cpu [2024-10-06 19:05:31,159][00269] Rollout worker 6 uses device cpu [2024-10-06 19:05:31,160][00269] Rollout worker 7 uses device cpu [2024-10-06 19:05:31,326][00269] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-06 19:05:31,327][00269] InferenceWorker_p0-w0: min num requests: 2 [2024-10-06 19:05:31,361][00269] Starting all processes... [2024-10-06 19:05:31,362][00269] Starting process learner_proc0 [2024-10-06 19:05:32,120][00269] Starting all processes... [2024-10-06 19:05:32,129][00269] Starting process inference_proc0-0 [2024-10-06 19:05:32,129][00269] Starting process rollout_proc0 [2024-10-06 19:05:32,131][00269] Starting process rollout_proc1 [2024-10-06 19:05:32,137][00269] Starting process rollout_proc2 [2024-10-06 19:05:32,138][00269] Starting process rollout_proc3 [2024-10-06 19:05:32,138][00269] Starting process rollout_proc4 [2024-10-06 19:05:32,138][00269] Starting process rollout_proc5 [2024-10-06 19:05:32,138][00269] Starting process rollout_proc6 [2024-10-06 19:05:32,138][00269] Starting process rollout_proc7 [2024-10-06 19:05:48,782][02286] Worker 4 uses CPU cores [0] [2024-10-06 19:05:48,854][02281] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-06 19:05:48,855][02281] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-10-06 19:05:48,900][02268] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-06 19:05:48,904][02268] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-10-06 19:05:48,977][02282] Worker 0 uses CPU cores [0] [2024-10-06 19:05:49,002][02281] Num visible devices: 1 [2024-10-06 19:05:49,003][02268] Num visible devices: 1 [2024-10-06 19:05:49,044][02268] Starting seed is not provided [2024-10-06 19:05:49,045][02268] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-06 19:05:49,046][02268] Initializing actor-critic model on device cuda:0 [2024-10-06 19:05:49,047][02268] RunningMeanStd input shape: (3, 72, 128) [2024-10-06 19:05:49,050][02268] RunningMeanStd input shape: (1,) [2024-10-06 19:05:49,166][02268] ConvEncoder: input_channels=3 [2024-10-06 19:05:49,238][02283] Worker 2 uses CPU cores [0] [2024-10-06 19:05:49,244][02288] Worker 6 uses CPU cores [0] [2024-10-06 19:05:49,253][02284] Worker 1 uses CPU cores [1] [2024-10-06 19:05:49,306][02285] Worker 3 uses CPU cores [1] [2024-10-06 19:05:49,322][02289] Worker 7 uses CPU cores [1] [2024-10-06 19:05:49,377][02287] Worker 5 uses CPU cores [1] [2024-10-06 19:05:49,546][02268] Conv encoder output size: 512 [2024-10-06 19:05:49,547][02268] Policy head output size: 512 [2024-10-06 19:05:49,618][02268] Created Actor Critic model with architecture: [2024-10-06 19:05:49,618][02268] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-10-06 19:05:49,993][02268] Using optimizer [2024-10-06 19:05:50,789][02268] No checkpoints found [2024-10-06 19:05:50,789][02268] Did not load from checkpoint, starting from scratch! [2024-10-06 19:05:50,789][02268] Initialized policy 0 weights for model version 0 [2024-10-06 19:05:50,793][02268] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-06 19:05:50,800][02268] LearnerWorker_p0 finished initialization! [2024-10-06 19:05:50,886][02281] RunningMeanStd input shape: (3, 72, 128) [2024-10-06 19:05:50,888][02281] RunningMeanStd input shape: (1,) [2024-10-06 19:05:50,900][02281] ConvEncoder: input_channels=3 [2024-10-06 19:05:51,002][02281] Conv encoder output size: 512 [2024-10-06 19:05:51,002][02281] Policy head output size: 512 [2024-10-06 19:05:51,054][00269] Inference worker 0-0 is ready! [2024-10-06 19:05:51,056][00269] All inference workers are ready! Signal rollout workers to start! [2024-10-06 19:05:51,261][02283] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 19:05:51,264][02287] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 19:05:51,259][02286] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 19:05:51,259][02285] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 19:05:51,268][02288] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 19:05:51,268][02284] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 19:05:51,263][02282] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 19:05:51,269][02289] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 19:05:51,319][00269] Heartbeat connected on Batcher_0 [2024-10-06 19:05:51,322][00269] Heartbeat connected on LearnerWorker_p0 [2024-10-06 19:05:51,369][00269] Heartbeat connected on InferenceWorker_p0-w0 [2024-10-06 19:05:52,333][02284] Decorrelating experience for 0 frames... [2024-10-06 19:05:52,332][02289] Decorrelating experience for 0 frames... [2024-10-06 19:05:52,736][02289] Decorrelating experience for 32 frames... [2024-10-06 19:05:52,916][02283] Decorrelating experience for 0 frames... [2024-10-06 19:05:52,914][02288] Decorrelating experience for 0 frames... [2024-10-06 19:05:52,926][02282] Decorrelating experience for 0 frames... [2024-10-06 19:05:52,923][02286] Decorrelating experience for 0 frames... [2024-10-06 19:05:53,505][02289] Decorrelating experience for 64 frames... [2024-10-06 19:05:53,672][02284] Decorrelating experience for 32 frames... [2024-10-06 19:05:54,038][02288] Decorrelating experience for 32 frames... [2024-10-06 19:05:54,041][02286] Decorrelating experience for 32 frames... [2024-10-06 19:05:54,049][02282] Decorrelating experience for 32 frames... [2024-10-06 19:05:54,454][02287] Decorrelating experience for 0 frames... [2024-10-06 19:05:54,726][02284] Decorrelating experience for 64 frames... [2024-10-06 19:05:55,209][00269] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-06 19:05:55,308][02283] Decorrelating experience for 32 frames... [2024-10-06 19:05:55,370][02285] Decorrelating experience for 0 frames... [2024-10-06 19:05:55,374][02287] Decorrelating experience for 32 frames... [2024-10-06 19:05:55,612][02286] Decorrelating experience for 64 frames... [2024-10-06 19:05:55,626][02288] Decorrelating experience for 64 frames... [2024-10-06 19:05:56,704][02282] Decorrelating experience for 64 frames... [2024-10-06 19:05:56,812][02286] Decorrelating experience for 96 frames... [2024-10-06 19:05:56,947][00269] Heartbeat connected on RolloutWorker_w4 [2024-10-06 19:05:57,048][02289] Decorrelating experience for 96 frames... [2024-10-06 19:05:57,109][02285] Decorrelating experience for 32 frames... [2024-10-06 19:05:57,130][02284] Decorrelating experience for 96 frames... [2024-10-06 19:05:57,465][00269] Heartbeat connected on RolloutWorker_w7 [2024-10-06 19:05:57,654][00269] Heartbeat connected on RolloutWorker_w1 [2024-10-06 19:05:57,763][02287] Decorrelating experience for 64 frames... [2024-10-06 19:05:58,658][02282] Decorrelating experience for 96 frames... [2024-10-06 19:05:58,889][00269] Heartbeat connected on RolloutWorker_w0 [2024-10-06 19:05:59,053][02288] Decorrelating experience for 96 frames... [2024-10-06 19:05:59,074][02283] Decorrelating experience for 64 frames... [2024-10-06 19:05:59,264][02285] Decorrelating experience for 64 frames... [2024-10-06 19:05:59,418][00269] Heartbeat connected on RolloutWorker_w6 [2024-10-06 19:05:59,431][02287] Decorrelating experience for 96 frames... [2024-10-06 19:05:59,611][00269] Heartbeat connected on RolloutWorker_w5 [2024-10-06 19:06:00,209][00269] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 120.4. Samples: 602. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-06 19:06:00,212][00269] Avg episode reward: [(0, '1.067')] [2024-10-06 19:06:01,921][02285] Decorrelating experience for 96 frames... [2024-10-06 19:06:03,066][00269] Heartbeat connected on RolloutWorker_w3 [2024-10-06 19:06:03,564][02283] Decorrelating experience for 96 frames... [2024-10-06 19:06:03,885][02268] Signal inference workers to stop experience collection... [2024-10-06 19:06:03,935][02281] InferenceWorker_p0-w0: stopping experience collection [2024-10-06 19:06:04,075][00269] Heartbeat connected on RolloutWorker_w2 [2024-10-06 19:06:05,210][00269] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 200.4. Samples: 2004. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-06 19:06:05,217][00269] Avg episode reward: [(0, '2.763')] [2024-10-06 19:06:07,165][02268] Signal inference workers to resume experience collection... [2024-10-06 19:06:07,166][02281] InferenceWorker_p0-w0: resuming experience collection [2024-10-06 19:06:10,209][00269] Fps is (10 sec: 1638.4, 60 sec: 1092.3, 300 sec: 1092.3). Total num frames: 16384. Throughput: 0: 276.3. Samples: 4144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 3.0) [2024-10-06 19:06:10,211][00269] Avg episode reward: [(0, '3.522')] [2024-10-06 19:06:15,128][02281] Updated weights for policy 0, policy_version 10 (0.0043) [2024-10-06 19:06:15,210][00269] Fps is (10 sec: 4096.0, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 40960. Throughput: 0: 534.6. Samples: 10692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:06:15,212][00269] Avg episode reward: [(0, '3.937')] [2024-10-06 19:06:20,210][00269] Fps is (10 sec: 3686.4, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 53248. Throughput: 0: 528.7. Samples: 13218. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:06:20,216][00269] Avg episode reward: [(0, '4.441')] [2024-10-06 19:06:25,210][00269] Fps is (10 sec: 2867.2, 60 sec: 2321.0, 300 sec: 2321.0). Total num frames: 69632. Throughput: 0: 577.1. Samples: 17314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:06:25,217][00269] Avg episode reward: [(0, '4.470')] [2024-10-06 19:06:27,372][02281] Updated weights for policy 0, policy_version 20 (0.0039) [2024-10-06 19:06:30,210][00269] Fps is (10 sec: 4096.0, 60 sec: 2691.6, 300 sec: 2691.6). Total num frames: 94208. Throughput: 0: 691.9. Samples: 24218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:06:30,213][00269] Avg episode reward: [(0, '4.210')] [2024-10-06 19:06:35,215][00269] Fps is (10 sec: 4503.2, 60 sec: 2866.8, 300 sec: 2866.8). Total num frames: 114688. Throughput: 0: 693.3. Samples: 27734. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:06:35,218][00269] Avg episode reward: [(0, '4.311')] [2024-10-06 19:06:35,232][02268] Saving new best policy, reward=4.311! [2024-10-06 19:06:36,976][02281] Updated weights for policy 0, policy_version 30 (0.0019) [2024-10-06 19:06:40,209][00269] Fps is (10 sec: 3686.5, 60 sec: 2912.7, 300 sec: 2912.7). Total num frames: 131072. Throughput: 0: 722.9. Samples: 32532. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-10-06 19:06:40,212][00269] Avg episode reward: [(0, '4.382')] [2024-10-06 19:06:40,216][02268] Saving new best policy, reward=4.382! [2024-10-06 19:06:45,210][00269] Fps is (10 sec: 3688.4, 60 sec: 3031.0, 300 sec: 3031.0). Total num frames: 151552. Throughput: 0: 842.4. Samples: 38508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:06:45,211][00269] Avg episode reward: [(0, '4.549')] [2024-10-06 19:06:45,223][02268] Saving new best policy, reward=4.549! [2024-10-06 19:06:47,574][02281] Updated weights for policy 0, policy_version 40 (0.0026) [2024-10-06 19:06:50,210][00269] Fps is (10 sec: 4095.9, 60 sec: 3127.8, 300 sec: 3127.8). Total num frames: 172032. Throughput: 0: 887.9. Samples: 41958. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:06:50,212][00269] Avg episode reward: [(0, '4.460')] [2024-10-06 19:06:55,209][00269] Fps is (10 sec: 3686.5, 60 sec: 3140.3, 300 sec: 3140.3). Total num frames: 188416. Throughput: 0: 967.2. Samples: 47666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:06:55,214][00269] Avg episode reward: [(0, '4.351')] [2024-10-06 19:06:59,316][02281] Updated weights for policy 0, policy_version 50 (0.0023) [2024-10-06 19:07:00,210][00269] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3213.8). Total num frames: 208896. Throughput: 0: 934.7. Samples: 52754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:07:00,219][00269] Avg episode reward: [(0, '4.469')] [2024-10-06 19:07:05,210][00269] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 229376. Throughput: 0: 955.0. Samples: 56194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:07:05,216][00269] Avg episode reward: [(0, '4.607')] [2024-10-06 19:07:05,264][02268] Saving new best policy, reward=4.607! [2024-10-06 19:07:07,927][02281] Updated weights for policy 0, policy_version 60 (0.0035) [2024-10-06 19:07:10,209][00269] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3331.4). Total num frames: 249856. Throughput: 0: 1010.9. Samples: 62806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:07:10,214][00269] Avg episode reward: [(0, '4.647')] [2024-10-06 19:07:10,216][02268] Saving new best policy, reward=4.647! [2024-10-06 19:07:15,210][00269] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 262144. Throughput: 0: 948.8. Samples: 66912. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-10-06 19:07:15,213][00269] Avg episode reward: [(0, '4.681')] [2024-10-06 19:07:15,221][02268] Saving new best policy, reward=4.681! [2024-10-06 19:07:20,117][02281] Updated weights for policy 0, policy_version 70 (0.0040) [2024-10-06 19:07:20,210][00269] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3373.2). Total num frames: 286720. Throughput: 0: 941.2. Samples: 70084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:07:20,213][00269] Avg episode reward: [(0, '4.608')] [2024-10-06 19:07:25,209][00269] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3413.3). Total num frames: 307200. Throughput: 0: 985.7. Samples: 76888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:07:25,212][00269] Avg episode reward: [(0, '4.369')] [2024-10-06 19:07:25,221][02268] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth... [2024-10-06 19:07:30,209][00269] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3406.1). Total num frames: 323584. Throughput: 0: 957.2. Samples: 81580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:07:30,213][00269] Avg episode reward: [(0, '4.450')] [2024-10-06 19:07:31,356][02281] Updated weights for policy 0, policy_version 80 (0.0031) [2024-10-06 19:07:35,210][00269] Fps is (10 sec: 3276.8, 60 sec: 3755.0, 300 sec: 3399.7). Total num frames: 339968. Throughput: 0: 932.0. Samples: 83900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:07:35,213][00269] Avg episode reward: [(0, '4.514')] [2024-10-06 19:07:40,211][00269] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3471.8). Total num frames: 364544. Throughput: 0: 957.8. Samples: 90766. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:07:40,217][00269] Avg episode reward: [(0, '4.446')] [2024-10-06 19:07:40,884][02281] Updated weights for policy 0, policy_version 90 (0.0022) [2024-10-06 19:07:45,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3463.0). Total num frames: 380928. Throughput: 0: 973.3. Samples: 96554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:07:45,215][00269] Avg episode reward: [(0, '4.550')] [2024-10-06 19:07:50,210][00269] Fps is (10 sec: 3277.2, 60 sec: 3754.7, 300 sec: 3454.9). Total num frames: 397312. Throughput: 0: 941.0. Samples: 98538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:07:50,217][00269] Avg episode reward: [(0, '4.811')] [2024-10-06 19:07:50,220][02268] Saving new best policy, reward=4.811! [2024-10-06 19:07:52,893][02281] Updated weights for policy 0, policy_version 100 (0.0034) [2024-10-06 19:07:55,209][00269] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3481.6). Total num frames: 417792. Throughput: 0: 925.1. Samples: 104436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:07:55,215][00269] Avg episode reward: [(0, '4.760')] [2024-10-06 19:08:00,212][00269] Fps is (10 sec: 4504.6, 60 sec: 3891.1, 300 sec: 3538.9). Total num frames: 442368. Throughput: 0: 983.1. Samples: 111152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:08:00,223][00269] Avg episode reward: [(0, '4.628')] [2024-10-06 19:08:03,061][02281] Updated weights for policy 0, policy_version 110 (0.0032) [2024-10-06 19:08:05,209][00269] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3497.4). Total num frames: 454656. Throughput: 0: 958.5. Samples: 113216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:08:05,217][00269] Avg episode reward: [(0, '4.636')] [2024-10-06 19:08:10,210][00269] Fps is (10 sec: 3277.5, 60 sec: 3754.7, 300 sec: 3519.5). Total num frames: 475136. Throughput: 0: 925.5. Samples: 118534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-06 19:08:10,215][00269] Avg episode reward: [(0, '4.508')] [2024-10-06 19:08:13,470][02281] Updated weights for policy 0, policy_version 120 (0.0023) [2024-10-06 19:08:15,210][00269] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3569.4). Total num frames: 499712. Throughput: 0: 974.2. Samples: 125418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:08:15,211][00269] Avg episode reward: [(0, '4.553')] [2024-10-06 19:08:20,209][00269] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3559.3). Total num frames: 516096. Throughput: 0: 984.6. Samples: 128208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:08:20,216][00269] Avg episode reward: [(0, '4.723')] [2024-10-06 19:08:25,127][02281] Updated weights for policy 0, policy_version 130 (0.0037) [2024-10-06 19:08:25,210][00269] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 532480. Throughput: 0: 925.2. Samples: 132398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:08:25,214][00269] Avg episode reward: [(0, '4.625')] [2024-10-06 19:08:30,210][00269] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3567.5). Total num frames: 552960. Throughput: 0: 951.0. Samples: 139350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:08:30,214][00269] Avg episode reward: [(0, '4.692')] [2024-10-06 19:08:34,106][02281] Updated weights for policy 0, policy_version 140 (0.0031) [2024-10-06 19:08:35,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3584.0). Total num frames: 573440. Throughput: 0: 983.5. Samples: 142794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:08:35,214][00269] Avg episode reward: [(0, '4.749')] [2024-10-06 19:08:40,210][00269] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3574.7). Total num frames: 589824. Throughput: 0: 951.9. Samples: 147272. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:08:40,216][00269] Avg episode reward: [(0, '4.549')] [2024-10-06 19:08:45,210][00269] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3590.0). Total num frames: 610304. Throughput: 0: 937.4. Samples: 153332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:08:45,214][00269] Avg episode reward: [(0, '4.342')] [2024-10-06 19:08:45,798][02281] Updated weights for policy 0, policy_version 150 (0.0033) [2024-10-06 19:08:50,210][00269] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3604.5). Total num frames: 630784. Throughput: 0: 967.2. Samples: 156740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:08:50,216][00269] Avg episode reward: [(0, '4.344')] [2024-10-06 19:08:55,218][00269] Fps is (10 sec: 3683.3, 60 sec: 3822.4, 300 sec: 3595.2). Total num frames: 647168. Throughput: 0: 968.3. Samples: 162116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:08:55,222][00269] Avg episode reward: [(0, '4.604')] [2024-10-06 19:08:57,260][02281] Updated weights for policy 0, policy_version 160 (0.0042) [2024-10-06 19:09:00,209][00269] Fps is (10 sec: 3276.9, 60 sec: 3686.5, 300 sec: 3586.8). Total num frames: 663552. Throughput: 0: 927.3. Samples: 167148. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:09:00,216][00269] Avg episode reward: [(0, '4.711')] [2024-10-06 19:09:05,210][00269] Fps is (10 sec: 4099.4, 60 sec: 3891.2, 300 sec: 3621.7). Total num frames: 688128. Throughput: 0: 940.2. Samples: 170518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:09:05,216][00269] Avg episode reward: [(0, '4.605')] [2024-10-06 19:09:06,850][02281] Updated weights for policy 0, policy_version 170 (0.0041) [2024-10-06 19:09:10,216][00269] Fps is (10 sec: 4502.7, 60 sec: 3890.8, 300 sec: 3633.8). Total num frames: 708608. Throughput: 0: 988.3. Samples: 176876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:09:10,219][00269] Avg episode reward: [(0, '4.520')] [2024-10-06 19:09:15,212][00269] Fps is (10 sec: 3276.0, 60 sec: 3686.3, 300 sec: 3604.4). Total num frames: 720896. Throughput: 0: 924.2. Samples: 180942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:09:15,217][00269] Avg episode reward: [(0, '4.408')] [2024-10-06 19:09:18,551][02281] Updated weights for policy 0, policy_version 180 (0.0023) [2024-10-06 19:09:20,210][00269] Fps is (10 sec: 3278.8, 60 sec: 3754.7, 300 sec: 3616.5). Total num frames: 741376. Throughput: 0: 923.7. Samples: 184362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:09:20,216][00269] Avg episode reward: [(0, '4.662')] [2024-10-06 19:09:25,210][00269] Fps is (10 sec: 4097.0, 60 sec: 3822.9, 300 sec: 3627.9). Total num frames: 761856. Throughput: 0: 970.6. Samples: 190948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:09:25,214][00269] Avg episode reward: [(0, '4.920')] [2024-10-06 19:09:25,229][02268] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth... [2024-10-06 19:09:25,397][02268] Saving new best policy, reward=4.920! [2024-10-06 19:09:29,877][02281] Updated weights for policy 0, policy_version 190 (0.0029) [2024-10-06 19:09:30,214][00269] Fps is (10 sec: 3684.7, 60 sec: 3754.4, 300 sec: 3619.6). Total num frames: 778240. Throughput: 0: 934.0. Samples: 195366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:09:30,217][00269] Avg episode reward: [(0, '5.066')] [2024-10-06 19:09:30,225][02268] Saving new best policy, reward=5.066! [2024-10-06 19:09:35,209][00269] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3611.9). Total num frames: 794624. Throughput: 0: 914.5. Samples: 197894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:09:35,216][00269] Avg episode reward: [(0, '4.927')] [2024-10-06 19:09:39,800][02281] Updated weights for policy 0, policy_version 200 (0.0027) [2024-10-06 19:09:40,209][00269] Fps is (10 sec: 4098.0, 60 sec: 3822.9, 300 sec: 3640.9). Total num frames: 819200. Throughput: 0: 946.7. Samples: 204710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:09:40,216][00269] Avg episode reward: [(0, '4.995')] [2024-10-06 19:09:45,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3633.0). Total num frames: 835584. Throughput: 0: 956.5. Samples: 210192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:09:45,212][00269] Avg episode reward: [(0, '5.191')] [2024-10-06 19:09:45,225][02268] Saving new best policy, reward=5.191! [2024-10-06 19:09:50,210][00269] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3625.4). Total num frames: 851968. Throughput: 0: 927.0. Samples: 212232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:09:50,215][00269] Avg episode reward: [(0, '4.905')] [2024-10-06 19:09:51,500][02281] Updated weights for policy 0, policy_version 210 (0.0035) [2024-10-06 19:09:55,214][00269] Fps is (10 sec: 4094.0, 60 sec: 3823.2, 300 sec: 3652.2). Total num frames: 876544. Throughput: 0: 926.1. Samples: 218548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:09:55,221][00269] Avg episode reward: [(0, '4.969')] [2024-10-06 19:10:00,209][00269] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3644.6). Total num frames: 892928. Throughput: 0: 975.2. Samples: 224822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:10:00,212][00269] Avg episode reward: [(0, '5.356')] [2024-10-06 19:10:00,266][02268] Saving new best policy, reward=5.356! [2024-10-06 19:10:01,752][02281] Updated weights for policy 0, policy_version 220 (0.0047) [2024-10-06 19:10:05,216][00269] Fps is (10 sec: 3276.5, 60 sec: 3686.0, 300 sec: 3637.2). Total num frames: 909312. Throughput: 0: 942.6. Samples: 226786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:10:05,224][00269] Avg episode reward: [(0, '5.398')] [2024-10-06 19:10:05,237][02268] Saving new best policy, reward=5.398! [2024-10-06 19:10:10,210][00269] Fps is (10 sec: 3686.3, 60 sec: 3686.8, 300 sec: 3646.2). Total num frames: 929792. Throughput: 0: 917.4. Samples: 232232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:10:10,215][00269] Avg episode reward: [(0, '5.141')] [2024-10-06 19:10:12,553][02281] Updated weights for policy 0, policy_version 230 (0.0026) [2024-10-06 19:10:15,210][00269] Fps is (10 sec: 4508.3, 60 sec: 3891.4, 300 sec: 3670.6). Total num frames: 954368. Throughput: 0: 972.1. Samples: 239106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:10:15,212][00269] Avg episode reward: [(0, '5.332')] [2024-10-06 19:10:20,209][00269] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3647.8). Total num frames: 966656. Throughput: 0: 971.0. Samples: 241588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:10:20,212][00269] Avg episode reward: [(0, '5.502')] [2024-10-06 19:10:20,218][02268] Saving new best policy, reward=5.502! [2024-10-06 19:10:24,536][02281] Updated weights for policy 0, policy_version 240 (0.0016) [2024-10-06 19:10:25,210][00269] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3640.9). Total num frames: 983040. Throughput: 0: 915.2. Samples: 245894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:10:25,215][00269] Avg episode reward: [(0, '5.099')] [2024-10-06 19:10:30,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3823.2, 300 sec: 3664.1). Total num frames: 1007616. Throughput: 0: 943.2. Samples: 252636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:10:30,216][00269] Avg episode reward: [(0, '4.865')] [2024-10-06 19:10:33,633][02281] Updated weights for policy 0, policy_version 250 (0.0030) [2024-10-06 19:10:35,212][00269] Fps is (10 sec: 4504.4, 60 sec: 3891.0, 300 sec: 3671.7). Total num frames: 1028096. Throughput: 0: 973.5. Samples: 256040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:10:35,216][00269] Avg episode reward: [(0, '5.007')] [2024-10-06 19:10:40,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3650.5). Total num frames: 1040384. Throughput: 0: 924.6. Samples: 260152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:10:40,212][00269] Avg episode reward: [(0, '4.995')] [2024-10-06 19:10:45,210][00269] Fps is (10 sec: 3277.6, 60 sec: 3754.6, 300 sec: 3658.1). Total num frames: 1060864. Throughput: 0: 924.5. Samples: 266426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-06 19:10:45,213][00269] Avg episode reward: [(0, '5.016')] [2024-10-06 19:10:45,373][02281] Updated weights for policy 0, policy_version 260 (0.0023) [2024-10-06 19:10:50,209][00269] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3679.5). Total num frames: 1085440. Throughput: 0: 956.2. Samples: 269808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:10:50,213][00269] Avg episode reward: [(0, '5.132')] [2024-10-06 19:10:55,210][00269] Fps is (10 sec: 3686.3, 60 sec: 3686.7, 300 sec: 3721.1). Total num frames: 1097728. Throughput: 0: 945.5. Samples: 274778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-06 19:10:55,216][00269] Avg episode reward: [(0, '4.951')] [2024-10-06 19:10:57,372][02281] Updated weights for policy 0, policy_version 270 (0.0039) [2024-10-06 19:11:00,210][00269] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1118208. Throughput: 0: 911.9. Samples: 280140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:11:00,218][00269] Avg episode reward: [(0, '5.048')] [2024-10-06 19:11:05,210][00269] Fps is (10 sec: 4096.1, 60 sec: 3823.3, 300 sec: 3804.4). Total num frames: 1138688. Throughput: 0: 932.4. Samples: 283548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-06 19:11:05,216][00269] Avg episode reward: [(0, '5.463')] [2024-10-06 19:11:06,277][02281] Updated weights for policy 0, policy_version 280 (0.0023) [2024-10-06 19:11:10,210][00269] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1159168. Throughput: 0: 978.0. Samples: 289906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:11:10,213][00269] Avg episode reward: [(0, '5.440')] [2024-10-06 19:11:15,210][00269] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 1175552. Throughput: 0: 925.6. Samples: 294290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-06 19:11:15,212][00269] Avg episode reward: [(0, '5.285')] [2024-10-06 19:11:17,691][02281] Updated weights for policy 0, policy_version 290 (0.0017) [2024-10-06 19:11:20,209][00269] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1196032. Throughput: 0: 927.1. Samples: 297756. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-06 19:11:20,213][00269] Avg episode reward: [(0, '5.544')] [2024-10-06 19:11:20,220][02268] Saving new best policy, reward=5.544! [2024-10-06 19:11:25,211][00269] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3804.4). Total num frames: 1216512. Throughput: 0: 987.1. Samples: 304572. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-06 19:11:25,218][00269] Avg episode reward: [(0, '5.477')] [2024-10-06 19:11:25,251][02268] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000298_1220608.pth... [2024-10-06 19:11:25,457][02268] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth [2024-10-06 19:11:28,597][02281] Updated weights for policy 0, policy_version 300 (0.0026) [2024-10-06 19:11:30,214][00269] Fps is (10 sec: 3684.6, 60 sec: 3754.4, 300 sec: 3790.5). Total num frames: 1232896. Throughput: 0: 943.1. Samples: 308870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:11:30,218][00269] Avg episode reward: [(0, '5.341')] [2024-10-06 19:11:35,209][00269] Fps is (10 sec: 3687.1, 60 sec: 3754.8, 300 sec: 3804.4). Total num frames: 1253376. Throughput: 0: 932.5. Samples: 311772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:11:35,213][00269] Avg episode reward: [(0, '5.730')] [2024-10-06 19:11:35,222][02268] Saving new best policy, reward=5.730! [2024-10-06 19:11:38,149][02281] Updated weights for policy 0, policy_version 310 (0.0026) [2024-10-06 19:11:40,210][00269] Fps is (10 sec: 4507.7, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1277952. Throughput: 0: 976.9. Samples: 318738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-06 19:11:40,214][00269] Avg episode reward: [(0, '6.200')] [2024-10-06 19:11:40,217][02268] Saving new best policy, reward=6.200! [2024-10-06 19:11:45,209][00269] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 1290240. Throughput: 0: 973.6. Samples: 323954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:11:45,215][00269] Avg episode reward: [(0, '6.175')] [2024-10-06 19:11:50,016][02281] Updated weights for policy 0, policy_version 320 (0.0024) [2024-10-06 19:11:50,209][00269] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1310720. Throughput: 0: 945.5. Samples: 326094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:11:50,213][00269] Avg episode reward: [(0, '6.234')] [2024-10-06 19:11:50,218][02268] Saving new best policy, reward=6.234! [2024-10-06 19:11:55,210][00269] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1331200. Throughput: 0: 951.8. Samples: 332738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:11:55,212][00269] Avg episode reward: [(0, '6.165')] [2024-10-06 19:11:59,739][02281] Updated weights for policy 0, policy_version 330 (0.0027) [2024-10-06 19:12:00,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1351680. Throughput: 0: 988.3. Samples: 338764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:12:00,213][00269] Avg episode reward: [(0, '5.881')] [2024-10-06 19:12:05,213][00269] Fps is (10 sec: 3275.8, 60 sec: 3754.5, 300 sec: 3776.6). Total num frames: 1363968. Throughput: 0: 957.6. Samples: 340852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:12:05,215][00269] Avg episode reward: [(0, '5.891')] [2024-10-06 19:12:10,209][00269] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 1388544. Throughput: 0: 935.5. Samples: 346670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:12:10,217][00269] Avg episode reward: [(0, '6.022')] [2024-10-06 19:12:10,861][02281] Updated weights for policy 0, policy_version 340 (0.0049) [2024-10-06 19:12:15,210][00269] Fps is (10 sec: 4916.7, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1413120. Throughput: 0: 997.5. Samples: 353752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:12:15,212][00269] Avg episode reward: [(0, '6.527')] [2024-10-06 19:12:15,220][02268] Saving new best policy, reward=6.527! [2024-10-06 19:12:20,209][00269] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1425408. Throughput: 0: 978.4. Samples: 355800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-06 19:12:20,217][00269] Avg episode reward: [(0, '6.580')] [2024-10-06 19:12:20,230][02268] Saving new best policy, reward=6.580! [2024-10-06 19:12:22,430][02281] Updated weights for policy 0, policy_version 350 (0.0026) [2024-10-06 19:12:25,209][00269] Fps is (10 sec: 3276.9, 60 sec: 3823.1, 300 sec: 3804.4). Total num frames: 1445888. Throughput: 0: 935.6. Samples: 360838. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-06 19:12:25,214][00269] Avg episode reward: [(0, '6.727')] [2024-10-06 19:12:25,225][02268] Saving new best policy, reward=6.727! [2024-10-06 19:12:30,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3891.5, 300 sec: 3818.3). Total num frames: 1466368. Throughput: 0: 970.9. Samples: 367644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:12:30,222][00269] Avg episode reward: [(0, '6.753')] [2024-10-06 19:12:30,223][02268] Saving new best policy, reward=6.753! [2024-10-06 19:12:31,563][02281] Updated weights for policy 0, policy_version 360 (0.0041) [2024-10-06 19:12:35,210][00269] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1482752. Throughput: 0: 990.3. Samples: 370660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:12:35,215][00269] Avg episode reward: [(0, '6.884')] [2024-10-06 19:12:35,231][02268] Saving new best policy, reward=6.884! [2024-10-06 19:12:40,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1499136. Throughput: 0: 936.7. Samples: 374888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:12:40,213][00269] Avg episode reward: [(0, '7.385')] [2024-10-06 19:12:40,217][02268] Saving new best policy, reward=7.385! [2024-10-06 19:12:43,110][02281] Updated weights for policy 0, policy_version 370 (0.0026) [2024-10-06 19:12:45,209][00269] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1523712. Throughput: 0: 952.8. Samples: 381640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:12:45,212][00269] Avg episode reward: [(0, '8.589')] [2024-10-06 19:12:45,222][02268] Saving new best policy, reward=8.589! [2024-10-06 19:12:50,209][00269] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1544192. Throughput: 0: 984.6. Samples: 385156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:12:50,219][00269] Avg episode reward: [(0, '9.352')] [2024-10-06 19:12:50,225][02268] Saving new best policy, reward=9.352! [2024-10-06 19:12:53,836][02281] Updated weights for policy 0, policy_version 380 (0.0031) [2024-10-06 19:12:55,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1556480. Throughput: 0: 959.9. Samples: 389866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:12:55,213][00269] Avg episode reward: [(0, '9.910')] [2024-10-06 19:12:55,228][02268] Saving new best policy, reward=9.910! [2024-10-06 19:13:00,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1576960. Throughput: 0: 931.1. Samples: 395650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:13:00,212][00269] Avg episode reward: [(0, '9.870')] [2024-10-06 19:13:03,768][02281] Updated weights for policy 0, policy_version 390 (0.0024) [2024-10-06 19:13:05,210][00269] Fps is (10 sec: 4505.5, 60 sec: 3959.7, 300 sec: 3818.3). Total num frames: 1601536. Throughput: 0: 963.2. Samples: 399142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:13:05,212][00269] Avg episode reward: [(0, '10.550')] [2024-10-06 19:13:05,224][02268] Saving new best policy, reward=10.550! [2024-10-06 19:13:10,212][00269] Fps is (10 sec: 4095.0, 60 sec: 3822.8, 300 sec: 3790.5). Total num frames: 1617920. Throughput: 0: 981.1. Samples: 404990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:13:10,217][00269] Avg episode reward: [(0, '9.475')] [2024-10-06 19:13:15,212][00269] Fps is (10 sec: 3276.0, 60 sec: 3686.3, 300 sec: 3790.5). Total num frames: 1634304. Throughput: 0: 937.5. Samples: 409832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:13:15,217][00269] Avg episode reward: [(0, '8.852')] [2024-10-06 19:13:15,471][02281] Updated weights for policy 0, policy_version 400 (0.0013) [2024-10-06 19:13:20,210][00269] Fps is (10 sec: 4096.9, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1658880. Throughput: 0: 945.2. Samples: 413194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:13:20,216][00269] Avg episode reward: [(0, '8.569')] [2024-10-06 19:13:24,925][02281] Updated weights for policy 0, policy_version 410 (0.0019) [2024-10-06 19:13:25,211][00269] Fps is (10 sec: 4506.2, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 1679360. Throughput: 0: 998.6. Samples: 419824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:13:25,214][00269] Avg episode reward: [(0, '8.689')] [2024-10-06 19:13:25,223][02268] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000410_1679360.pth... [2024-10-06 19:13:25,385][02268] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth [2024-10-06 19:13:30,209][00269] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1691648. Throughput: 0: 938.7. Samples: 423882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:13:30,212][00269] Avg episode reward: [(0, '8.941')] [2024-10-06 19:13:35,210][00269] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1716224. Throughput: 0: 934.0. Samples: 427188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:13:35,214][00269] Avg episode reward: [(0, '8.527')] [2024-10-06 19:13:36,209][02281] Updated weights for policy 0, policy_version 420 (0.0030) [2024-10-06 19:13:40,209][00269] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1736704. Throughput: 0: 982.3. Samples: 434068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:13:40,212][00269] Avg episode reward: [(0, '8.902')] [2024-10-06 19:13:45,211][00269] Fps is (10 sec: 3276.3, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 1748992. Throughput: 0: 957.7. Samples: 438746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:13:45,215][00269] Avg episode reward: [(0, '9.033')] [2024-10-06 19:13:47,941][02281] Updated weights for policy 0, policy_version 430 (0.0037) [2024-10-06 19:13:50,210][00269] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3804.5). Total num frames: 1769472. Throughput: 0: 932.4. Samples: 441098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:13:50,212][00269] Avg episode reward: [(0, '9.745')] [2024-10-06 19:13:55,210][00269] Fps is (10 sec: 4506.2, 60 sec: 3959.4, 300 sec: 3832.2). Total num frames: 1794048. Throughput: 0: 957.8. Samples: 448088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:13:55,213][00269] Avg episode reward: [(0, '10.596')] [2024-10-06 19:13:55,223][02268] Saving new best policy, reward=10.596! [2024-10-06 19:13:57,131][02281] Updated weights for policy 0, policy_version 440 (0.0014) [2024-10-06 19:14:00,209][00269] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1810432. Throughput: 0: 971.7. Samples: 453558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:14:00,217][00269] Avg episode reward: [(0, '10.922')] [2024-10-06 19:14:00,222][02268] Saving new best policy, reward=10.922! [2024-10-06 19:14:05,210][00269] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3790.6). Total num frames: 1826816. Throughput: 0: 940.7. Samples: 455528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-06 19:14:05,213][00269] Avg episode reward: [(0, '10.990')] [2024-10-06 19:14:05,226][02268] Saving new best policy, reward=10.990! [2024-10-06 19:14:08,639][02281] Updated weights for policy 0, policy_version 450 (0.0020) [2024-10-06 19:14:10,209][00269] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3818.3). Total num frames: 1847296. Throughput: 0: 935.8. Samples: 461936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:14:10,215][00269] Avg episode reward: [(0, '11.380')] [2024-10-06 19:14:10,218][02268] Saving new best policy, reward=11.380! [2024-10-06 19:14:15,213][00269] Fps is (10 sec: 4094.9, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 1867776. Throughput: 0: 988.1. Samples: 468350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:14:15,217][00269] Avg episode reward: [(0, '11.680')] [2024-10-06 19:14:15,234][02268] Saving new best policy, reward=11.680! [2024-10-06 19:14:20,108][02281] Updated weights for policy 0, policy_version 460 (0.0024) [2024-10-06 19:14:20,209][00269] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1884160. Throughput: 0: 958.6. Samples: 470326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:14:20,213][00269] Avg episode reward: [(0, '12.025')] [2024-10-06 19:14:20,226][02268] Saving new best policy, reward=12.025! [2024-10-06 19:14:25,210][00269] Fps is (10 sec: 3277.8, 60 sec: 3686.5, 300 sec: 3804.5). Total num frames: 1900544. Throughput: 0: 911.9. Samples: 475106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:14:25,215][00269] Avg episode reward: [(0, '11.727')] [2024-10-06 19:14:30,210][00269] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1916928. Throughput: 0: 930.6. Samples: 480620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:14:30,212][00269] Avg episode reward: [(0, '11.614')] [2024-10-06 19:14:31,184][02281] Updated weights for policy 0, policy_version 470 (0.0052) [2024-10-06 19:14:35,211][00269] Fps is (10 sec: 3276.4, 60 sec: 3618.1, 300 sec: 3776.6). Total num frames: 1933312. Throughput: 0: 936.5. Samples: 483242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:14:35,216][00269] Avg episode reward: [(0, '11.647')] [2024-10-06 19:14:40,209][00269] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 1953792. Throughput: 0: 883.3. Samples: 487836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:14:40,217][00269] Avg episode reward: [(0, '12.575')] [2024-10-06 19:14:40,220][02268] Saving new best policy, reward=12.575! [2024-10-06 19:14:42,885][02281] Updated weights for policy 0, policy_version 480 (0.0029) [2024-10-06 19:14:45,210][00269] Fps is (10 sec: 4096.6, 60 sec: 3754.8, 300 sec: 3804.4). Total num frames: 1974272. Throughput: 0: 912.3. Samples: 494610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:14:45,213][00269] Avg episode reward: [(0, '13.918')] [2024-10-06 19:14:45,230][02268] Saving new best policy, reward=13.918! [2024-10-06 19:14:50,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 1994752. Throughput: 0: 945.2. Samples: 498060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:14:50,214][00269] Avg episode reward: [(0, '14.723')] [2024-10-06 19:14:50,218][02268] Saving new best policy, reward=14.723! [2024-10-06 19:14:54,166][02281] Updated weights for policy 0, policy_version 490 (0.0018) [2024-10-06 19:14:55,210][00269] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3776.6). Total num frames: 2007040. Throughput: 0: 896.7. Samples: 502288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:14:55,214][00269] Avg episode reward: [(0, '15.388')] [2024-10-06 19:14:55,228][02268] Saving new best policy, reward=15.388! [2024-10-06 19:15:00,210][00269] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3790.6). Total num frames: 2027520. Throughput: 0: 885.5. Samples: 508194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:15:00,217][00269] Avg episode reward: [(0, '15.218')] [2024-10-06 19:15:03,900][02281] Updated weights for policy 0, policy_version 500 (0.0026) [2024-10-06 19:15:05,209][00269] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2052096. Throughput: 0: 917.4. Samples: 511610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:15:05,219][00269] Avg episode reward: [(0, '14.107')] [2024-10-06 19:15:10,212][00269] Fps is (10 sec: 3685.4, 60 sec: 3618.0, 300 sec: 3762.7). Total num frames: 2064384. Throughput: 0: 924.6. Samples: 516714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:15:10,216][00269] Avg episode reward: [(0, '14.342')] [2024-10-06 19:15:15,210][00269] Fps is (10 sec: 3276.7, 60 sec: 3618.3, 300 sec: 3790.5). Total num frames: 2084864. Throughput: 0: 916.4. Samples: 521858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:15:15,217][00269] Avg episode reward: [(0, '14.837')] [2024-10-06 19:15:16,083][02281] Updated weights for policy 0, policy_version 510 (0.0028) [2024-10-06 19:15:20,209][00269] Fps is (10 sec: 4097.1, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2105344. Throughput: 0: 932.0. Samples: 525182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:15:20,221][00269] Avg episode reward: [(0, '16.322')] [2024-10-06 19:15:20,223][02268] Saving new best policy, reward=16.322! [2024-10-06 19:15:25,211][00269] Fps is (10 sec: 3685.8, 60 sec: 3686.3, 300 sec: 3776.6). Total num frames: 2121728. Throughput: 0: 963.1. Samples: 531176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:15:25,217][00269] Avg episode reward: [(0, '17.877')] [2024-10-06 19:15:25,237][02268] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000518_2121728.pth... [2024-10-06 19:15:25,408][02268] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000298_1220608.pth [2024-10-06 19:15:25,428][02268] Saving new best policy, reward=17.877! [2024-10-06 19:15:27,248][02281] Updated weights for policy 0, policy_version 520 (0.0021) [2024-10-06 19:15:30,210][00269] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2138112. Throughput: 0: 899.6. Samples: 535090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:15:30,215][00269] Avg episode reward: [(0, '18.250')] [2024-10-06 19:15:30,219][02268] Saving new best policy, reward=18.250! [2024-10-06 19:15:35,210][00269] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2158592. Throughput: 0: 893.8. Samples: 538282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:15:35,212][00269] Avg episode reward: [(0, '18.972')] [2024-10-06 19:15:35,226][02268] Saving new best policy, reward=18.972! [2024-10-06 19:15:37,533][02281] Updated weights for policy 0, policy_version 530 (0.0027) [2024-10-06 19:15:40,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2179072. Throughput: 0: 947.6. Samples: 544928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:15:40,220][00269] Avg episode reward: [(0, '18.035')] [2024-10-06 19:15:45,210][00269] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2195456. Throughput: 0: 912.9. Samples: 549276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-06 19:15:45,213][00269] Avg episode reward: [(0, '18.017')] [2024-10-06 19:15:49,539][02281] Updated weights for policy 0, policy_version 540 (0.0030) [2024-10-06 19:15:50,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 2211840. Throughput: 0: 894.9. Samples: 551880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:15:50,217][00269] Avg episode reward: [(0, '16.332')] [2024-10-06 19:15:55,209][00269] Fps is (10 sec: 4096.1, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 2236416. Throughput: 0: 930.9. Samples: 558604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:15:55,212][00269] Avg episode reward: [(0, '17.345')] [2024-10-06 19:16:00,214][00269] Fps is (10 sec: 4094.0, 60 sec: 3754.4, 300 sec: 3776.6). Total num frames: 2252800. Throughput: 0: 927.9. Samples: 563616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:16:00,216][00269] Avg episode reward: [(0, '17.750')] [2024-10-06 19:16:00,223][02281] Updated weights for policy 0, policy_version 550 (0.0036) [2024-10-06 19:16:05,209][00269] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3748.9). Total num frames: 2265088. Throughput: 0: 898.1. Samples: 565596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:16:05,212][00269] Avg episode reward: [(0, '18.217')] [2024-10-06 19:16:10,210][00269] Fps is (10 sec: 3278.4, 60 sec: 3686.6, 300 sec: 3762.8). Total num frames: 2285568. Throughput: 0: 903.9. Samples: 571850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:16:10,211][00269] Avg episode reward: [(0, '18.117')] [2024-10-06 19:16:11,102][02281] Updated weights for policy 0, policy_version 560 (0.0038) [2024-10-06 19:16:15,210][00269] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2306048. Throughput: 0: 956.3. Samples: 578124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:16:15,222][00269] Avg episode reward: [(0, '16.425')] [2024-10-06 19:16:20,210][00269] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 2322432. Throughput: 0: 929.0. Samples: 580086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:16:20,212][00269] Avg episode reward: [(0, '15.959')] [2024-10-06 19:16:23,145][02281] Updated weights for policy 0, policy_version 570 (0.0031) [2024-10-06 19:16:25,210][00269] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3762.8). Total num frames: 2342912. Throughput: 0: 900.4. Samples: 585444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:16:25,211][00269] Avg episode reward: [(0, '15.668')] [2024-10-06 19:16:30,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2363392. Throughput: 0: 950.7. Samples: 592056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:16:30,212][00269] Avg episode reward: [(0, '15.491')] [2024-10-06 19:16:33,083][02281] Updated weights for policy 0, policy_version 580 (0.0013) [2024-10-06 19:16:35,210][00269] Fps is (10 sec: 3686.0, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2379776. Throughput: 0: 946.9. Samples: 594490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:16:35,213][00269] Avg episode reward: [(0, '15.834')] [2024-10-06 19:16:40,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 2396160. Throughput: 0: 899.7. Samples: 599090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:16:40,211][00269] Avg episode reward: [(0, '15.151')] [2024-10-06 19:16:44,105][02281] Updated weights for policy 0, policy_version 590 (0.0029) [2024-10-06 19:16:45,210][00269] Fps is (10 sec: 4096.3, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2420736. Throughput: 0: 937.2. Samples: 605786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:16:45,217][00269] Avg episode reward: [(0, '14.332')] [2024-10-06 19:16:50,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2437120. Throughput: 0: 968.5. Samples: 609178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:16:50,219][00269] Avg episode reward: [(0, '14.800')] [2024-10-06 19:16:55,209][00269] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 2453504. Throughput: 0: 921.8. Samples: 613332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:16:55,213][00269] Avg episode reward: [(0, '15.249')] [2024-10-06 19:16:55,846][02281] Updated weights for policy 0, policy_version 600 (0.0018) [2024-10-06 19:17:00,210][00269] Fps is (10 sec: 3686.0, 60 sec: 3686.6, 300 sec: 3762.8). Total num frames: 2473984. Throughput: 0: 924.8. Samples: 619740. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:17:00,213][00269] Avg episode reward: [(0, '16.327')] [2024-10-06 19:17:04,768][02281] Updated weights for policy 0, policy_version 610 (0.0025) [2024-10-06 19:17:05,211][00269] Fps is (10 sec: 4504.8, 60 sec: 3891.1, 300 sec: 3762.7). Total num frames: 2498560. Throughput: 0: 956.0. Samples: 623106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:17:05,216][00269] Avg episode reward: [(0, '16.328')] [2024-10-06 19:17:10,209][00269] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2510848. Throughput: 0: 949.6. Samples: 628174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:17:10,217][00269] Avg episode reward: [(0, '17.208')] [2024-10-06 19:17:15,209][00269] Fps is (10 sec: 3277.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2531328. Throughput: 0: 930.7. Samples: 633936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:17:15,215][00269] Avg episode reward: [(0, '16.989')] [2024-10-06 19:17:16,525][02281] Updated weights for policy 0, policy_version 620 (0.0041) [2024-10-06 19:17:20,210][00269] Fps is (10 sec: 4505.4, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2555904. Throughput: 0: 952.9. Samples: 637372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:17:20,212][00269] Avg episode reward: [(0, '17.910')] [2024-10-06 19:17:25,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2572288. Throughput: 0: 980.0. Samples: 643190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:17:25,220][00269] Avg episode reward: [(0, '17.888')] [2024-10-06 19:17:25,233][02268] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000628_2572288.pth... [2024-10-06 19:17:25,387][02268] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000410_1679360.pth [2024-10-06 19:17:27,954][02281] Updated weights for policy 0, policy_version 630 (0.0031) [2024-10-06 19:17:30,209][00269] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2588672. Throughput: 0: 934.1. Samples: 647818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:17:30,212][00269] Avg episode reward: [(0, '17.346')] [2024-10-06 19:17:35,209][00269] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 2609152. Throughput: 0: 934.9. Samples: 651250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:17:35,212][00269] Avg episode reward: [(0, '16.757')] [2024-10-06 19:17:37,214][02281] Updated weights for policy 0, policy_version 640 (0.0025) [2024-10-06 19:17:40,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2629632. Throughput: 0: 993.3. Samples: 658032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:17:40,212][00269] Avg episode reward: [(0, '16.871')] [2024-10-06 19:17:45,215][00269] Fps is (10 sec: 3684.3, 60 sec: 3754.3, 300 sec: 3734.9). Total num frames: 2646016. Throughput: 0: 946.0. Samples: 662314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:17:45,217][00269] Avg episode reward: [(0, '16.384')] [2024-10-06 19:17:48,732][02281] Updated weights for policy 0, policy_version 650 (0.0026) [2024-10-06 19:17:50,209][00269] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2666496. Throughput: 0: 941.4. Samples: 665468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:17:50,212][00269] Avg episode reward: [(0, '18.028')] [2024-10-06 19:17:55,209][00269] Fps is (10 sec: 4508.1, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 2691072. Throughput: 0: 985.1. Samples: 672502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:17:55,212][00269] Avg episode reward: [(0, '18.774')] [2024-10-06 19:17:58,694][02281] Updated weights for policy 0, policy_version 660 (0.0031) [2024-10-06 19:18:00,210][00269] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3735.0). Total num frames: 2703360. Throughput: 0: 966.7. Samples: 677438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:18:00,218][00269] Avg episode reward: [(0, '19.726')] [2024-10-06 19:18:00,254][02268] Saving new best policy, reward=19.726! [2024-10-06 19:18:05,210][00269] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3748.9). Total num frames: 2723840. Throughput: 0: 935.7. Samples: 679480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-06 19:18:05,214][00269] Avg episode reward: [(0, '19.415')] [2024-10-06 19:18:09,258][02281] Updated weights for policy 0, policy_version 670 (0.0039) [2024-10-06 19:18:10,210][00269] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2744320. Throughput: 0: 961.4. Samples: 686454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:18:10,212][00269] Avg episode reward: [(0, '17.873')] [2024-10-06 19:18:15,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2764800. Throughput: 0: 996.3. Samples: 692650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-06 19:18:15,212][00269] Avg episode reward: [(0, '17.022')] [2024-10-06 19:18:20,209][00269] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2781184. Throughput: 0: 966.1. Samples: 694726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:18:20,212][00269] Avg episode reward: [(0, '17.088')] [2024-10-06 19:18:20,698][02281] Updated weights for policy 0, policy_version 680 (0.0033) [2024-10-06 19:18:25,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2805760. Throughput: 0: 953.5. Samples: 700940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:18:25,213][00269] Avg episode reward: [(0, '17.602')] [2024-10-06 19:18:29,372][02281] Updated weights for policy 0, policy_version 690 (0.0016) [2024-10-06 19:18:30,209][00269] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 2826240. Throughput: 0: 1009.9. Samples: 707756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:18:30,212][00269] Avg episode reward: [(0, '17.228')] [2024-10-06 19:18:35,210][00269] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2842624. Throughput: 0: 986.3. Samples: 709850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-06 19:18:35,215][00269] Avg episode reward: [(0, '18.162')] [2024-10-06 19:18:40,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2859008. Throughput: 0: 940.9. Samples: 714844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-06 19:18:40,216][00269] Avg episode reward: [(0, '18.765')] [2024-10-06 19:18:41,317][02281] Updated weights for policy 0, policy_version 700 (0.0028) [2024-10-06 19:18:45,209][00269] Fps is (10 sec: 4096.1, 60 sec: 3959.8, 300 sec: 3776.7). Total num frames: 2883584. Throughput: 0: 984.3. Samples: 721730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:18:45,212][00269] Avg episode reward: [(0, '18.782')] [2024-10-06 19:18:50,211][00269] Fps is (10 sec: 4095.2, 60 sec: 3891.1, 300 sec: 3748.9). Total num frames: 2899968. Throughput: 0: 1002.7. Samples: 724604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:18:50,216][00269] Avg episode reward: [(0, '20.303')] [2024-10-06 19:18:50,218][02268] Saving new best policy, reward=20.303! [2024-10-06 19:18:52,609][02281] Updated weights for policy 0, policy_version 710 (0.0013) [2024-10-06 19:18:55,210][00269] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2916352. Throughput: 0: 938.3. Samples: 728678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:18:55,214][00269] Avg episode reward: [(0, '21.408')] [2024-10-06 19:18:55,227][02268] Saving new best policy, reward=21.408! [2024-10-06 19:19:00,210][00269] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2936832. Throughput: 0: 949.7. Samples: 735386. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:19:00,214][00269] Avg episode reward: [(0, '22.219')] [2024-10-06 19:19:00,215][02268] Saving new best policy, reward=22.219! [2024-10-06 19:19:02,388][02281] Updated weights for policy 0, policy_version 720 (0.0031) [2024-10-06 19:19:05,211][00269] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3762.7). Total num frames: 2957312. Throughput: 0: 977.5. Samples: 738714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:19:05,216][00269] Avg episode reward: [(0, '20.756')] [2024-10-06 19:19:10,211][00269] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3748.9). Total num frames: 2973696. Throughput: 0: 941.6. Samples: 743312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:19:10,214][00269] Avg episode reward: [(0, '20.437')] [2024-10-06 19:19:13,977][02281] Updated weights for policy 0, policy_version 730 (0.0021) [2024-10-06 19:19:15,209][00269] Fps is (10 sec: 3687.1, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2994176. Throughput: 0: 925.6. Samples: 749408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:19:15,216][00269] Avg episode reward: [(0, '18.960')] [2024-10-06 19:19:20,210][00269] Fps is (10 sec: 4506.4, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 3018752. Throughput: 0: 958.7. Samples: 752990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:19:20,212][00269] Avg episode reward: [(0, '19.689')] [2024-10-06 19:19:23,669][02281] Updated weights for policy 0, policy_version 740 (0.0040) [2024-10-06 19:19:25,210][00269] Fps is (10 sec: 4095.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3035136. Throughput: 0: 971.2. Samples: 758548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:19:25,214][00269] Avg episode reward: [(0, '19.293')] [2024-10-06 19:19:25,229][02268] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000741_3035136.pth... [2024-10-06 19:19:25,405][02268] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000518_2121728.pth [2024-10-06 19:19:30,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 3051520. Throughput: 0: 929.2. Samples: 763542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:19:30,212][00269] Avg episode reward: [(0, '20.150')] [2024-10-06 19:19:34,672][02281] Updated weights for policy 0, policy_version 750 (0.0030) [2024-10-06 19:19:35,210][00269] Fps is (10 sec: 3686.7, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 3072000. Throughput: 0: 936.1. Samples: 766726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:19:35,217][00269] Avg episode reward: [(0, '20.613')] [2024-10-06 19:19:40,210][00269] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3092480. Throughput: 0: 985.8. Samples: 773038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:19:40,216][00269] Avg episode reward: [(0, '20.578')] [2024-10-06 19:19:45,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3104768. Throughput: 0: 928.9. Samples: 777188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:19:45,217][00269] Avg episode reward: [(0, '20.354')] [2024-10-06 19:19:46,559][02281] Updated weights for policy 0, policy_version 760 (0.0023) [2024-10-06 19:19:50,209][00269] Fps is (10 sec: 3686.6, 60 sec: 3823.1, 300 sec: 3804.4). Total num frames: 3129344. Throughput: 0: 929.8. Samples: 780552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:19:50,211][00269] Avg episode reward: [(0, '21.039')] [2024-10-06 19:19:55,210][00269] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3149824. Throughput: 0: 982.1. Samples: 787506. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-06 19:19:55,216][00269] Avg episode reward: [(0, '21.719')] [2024-10-06 19:19:55,430][02281] Updated weights for policy 0, policy_version 770 (0.0037) [2024-10-06 19:20:00,210][00269] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 3166208. Throughput: 0: 953.2. Samples: 792304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:20:00,218][00269] Avg episode reward: [(0, '22.532')] [2024-10-06 19:20:00,220][02268] Saving new best policy, reward=22.532! [2024-10-06 19:20:05,209][00269] Fps is (10 sec: 3276.9, 60 sec: 3754.8, 300 sec: 3790.6). Total num frames: 3182592. Throughput: 0: 922.0. Samples: 794482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:20:05,214][00269] Avg episode reward: [(0, '21.397')] [2024-10-06 19:20:07,337][02281] Updated weights for policy 0, policy_version 780 (0.0034) [2024-10-06 19:20:10,210][00269] Fps is (10 sec: 4096.1, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 3207168. Throughput: 0: 948.4. Samples: 801224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:20:10,215][00269] Avg episode reward: [(0, '20.402')] [2024-10-06 19:20:15,212][00269] Fps is (10 sec: 4095.0, 60 sec: 3822.8, 300 sec: 3790.5). Total num frames: 3223552. Throughput: 0: 959.1. Samples: 806702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:20:15,215][00269] Avg episode reward: [(0, '20.250')] [2024-10-06 19:20:19,184][02281] Updated weights for policy 0, policy_version 790 (0.0049) [2024-10-06 19:20:20,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 3239936. Throughput: 0: 933.4. Samples: 808728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:20:20,217][00269] Avg episode reward: [(0, '20.802')] [2024-10-06 19:20:25,210][00269] Fps is (10 sec: 3687.2, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3260416. Throughput: 0: 932.9. Samples: 815018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:20:25,220][00269] Avg episode reward: [(0, '20.724')] [2024-10-06 19:20:28,480][02281] Updated weights for policy 0, policy_version 800 (0.0024) [2024-10-06 19:20:30,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3280896. Throughput: 0: 982.2. Samples: 821386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:20:30,218][00269] Avg episode reward: [(0, '21.552')] [2024-10-06 19:20:35,210][00269] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 3293184. Throughput: 0: 950.6. Samples: 823328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:20:35,213][00269] Avg episode reward: [(0, '23.125')] [2024-10-06 19:20:35,229][02268] Saving new best policy, reward=23.125! [2024-10-06 19:20:40,210][00269] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 3313664. Throughput: 0: 907.3. Samples: 828334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:20:40,214][00269] Avg episode reward: [(0, '23.345')] [2024-10-06 19:20:40,233][02268] Saving new best policy, reward=23.345! [2024-10-06 19:20:40,738][02281] Updated weights for policy 0, policy_version 810 (0.0043) [2024-10-06 19:20:45,210][00269] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3334144. Throughput: 0: 945.5. Samples: 834852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:20:45,217][00269] Avg episode reward: [(0, '22.264')] [2024-10-06 19:20:50,213][00269] Fps is (10 sec: 3685.2, 60 sec: 3686.2, 300 sec: 3776.6). Total num frames: 3350528. Throughput: 0: 960.0. Samples: 837686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:20:50,223][00269] Avg episode reward: [(0, '21.990')] [2024-10-06 19:20:51,665][02281] Updated weights for policy 0, policy_version 820 (0.0033) [2024-10-06 19:20:55,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 3366912. Throughput: 0: 900.6. Samples: 841752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:20:55,212][00269] Avg episode reward: [(0, '21.607')] [2024-10-06 19:21:00,210][00269] Fps is (10 sec: 3687.7, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 3387392. Throughput: 0: 921.0. Samples: 848144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:21:00,219][00269] Avg episode reward: [(0, '20.790')] [2024-10-06 19:21:02,301][02281] Updated weights for policy 0, policy_version 830 (0.0015) [2024-10-06 19:21:05,215][00269] Fps is (10 sec: 4093.8, 60 sec: 3754.3, 300 sec: 3804.4). Total num frames: 3407872. Throughput: 0: 946.5. Samples: 851326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:21:05,221][00269] Avg episode reward: [(0, '21.137')] [2024-10-06 19:21:10,212][00269] Fps is (10 sec: 3276.0, 60 sec: 3549.7, 300 sec: 3776.6). Total num frames: 3420160. Throughput: 0: 896.7. Samples: 855372. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-06 19:21:10,215][00269] Avg episode reward: [(0, '21.419')] [2024-10-06 19:21:14,983][02281] Updated weights for policy 0, policy_version 840 (0.0046) [2024-10-06 19:21:15,210][00269] Fps is (10 sec: 3278.6, 60 sec: 3618.3, 300 sec: 3790.5). Total num frames: 3440640. Throughput: 0: 878.5. Samples: 860918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:21:15,212][00269] Avg episode reward: [(0, '21.754')] [2024-10-06 19:21:20,210][00269] Fps is (10 sec: 4097.0, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 3461120. Throughput: 0: 904.9. Samples: 864048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:21:20,213][00269] Avg episode reward: [(0, '21.584')] [2024-10-06 19:21:25,211][00269] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3776.6). Total num frames: 3477504. Throughput: 0: 909.1. Samples: 869244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:21:25,222][00269] Avg episode reward: [(0, '22.087')] [2024-10-06 19:21:25,235][02268] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000849_3477504.pth... [2024-10-06 19:21:25,411][02268] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000628_2572288.pth [2024-10-06 19:21:26,803][02281] Updated weights for policy 0, policy_version 850 (0.0029) [2024-10-06 19:21:30,213][00269] Fps is (10 sec: 3275.7, 60 sec: 3549.7, 300 sec: 3776.6). Total num frames: 3493888. Throughput: 0: 868.0. Samples: 873914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:21:30,216][00269] Avg episode reward: [(0, '22.429')] [2024-10-06 19:21:35,209][00269] Fps is (10 sec: 3687.1, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 3514368. Throughput: 0: 874.1. Samples: 877016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:21:35,214][00269] Avg episode reward: [(0, '22.645')] [2024-10-06 19:21:36,750][02281] Updated weights for policy 0, policy_version 860 (0.0021) [2024-10-06 19:21:40,212][00269] Fps is (10 sec: 3686.7, 60 sec: 3618.0, 300 sec: 3762.7). Total num frames: 3530752. Throughput: 0: 924.9. Samples: 883376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:21:40,216][00269] Avg episode reward: [(0, '23.310')] [2024-10-06 19:21:45,210][00269] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 3547136. Throughput: 0: 871.8. Samples: 887376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:21:45,218][00269] Avg episode reward: [(0, '22.941')] [2024-10-06 19:21:48,850][02281] Updated weights for policy 0, policy_version 870 (0.0026) [2024-10-06 19:21:50,209][00269] Fps is (10 sec: 3687.3, 60 sec: 3618.4, 300 sec: 3776.7). Total num frames: 3567616. Throughput: 0: 870.6. Samples: 890496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:21:50,217][00269] Avg episode reward: [(0, '24.885')] [2024-10-06 19:21:50,220][02268] Saving new best policy, reward=24.885! [2024-10-06 19:21:55,210][00269] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 3588096. Throughput: 0: 927.3. Samples: 897098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:21:55,218][00269] Avg episode reward: [(0, '23.350')] [2024-10-06 19:22:00,196][02281] Updated weights for policy 0, policy_version 880 (0.0037) [2024-10-06 19:22:00,209][00269] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3604480. Throughput: 0: 904.8. Samples: 901636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:22:00,219][00269] Avg episode reward: [(0, '23.565')] [2024-10-06 19:22:05,210][00269] Fps is (10 sec: 3276.8, 60 sec: 3550.2, 300 sec: 3762.8). Total num frames: 3620864. Throughput: 0: 884.7. Samples: 903860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:22:05,216][00269] Avg episode reward: [(0, '22.967')] [2024-10-06 19:22:10,003][02281] Updated weights for policy 0, policy_version 890 (0.0017) [2024-10-06 19:22:10,209][00269] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 3645440. Throughput: 0: 924.0. Samples: 910820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:22:10,214][00269] Avg episode reward: [(0, '22.601')] [2024-10-06 19:22:15,210][00269] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3661824. Throughput: 0: 950.5. Samples: 916684. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-06 19:22:15,213][00269] Avg episode reward: [(0, '21.929')] [2024-10-06 19:22:20,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3678208. Throughput: 0: 927.6. Samples: 918760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:22:20,213][00269] Avg episode reward: [(0, '21.423')] [2024-10-06 19:22:21,635][02281] Updated weights for policy 0, policy_version 900 (0.0036) [2024-10-06 19:22:25,209][00269] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3762.8). Total num frames: 3698688. Throughput: 0: 922.6. Samples: 924890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:22:25,212][00269] Avg episode reward: [(0, '22.622')] [2024-10-06 19:22:30,210][00269] Fps is (10 sec: 4505.5, 60 sec: 3823.1, 300 sec: 3776.6). Total num frames: 3723264. Throughput: 0: 978.3. Samples: 931402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:22:30,214][00269] Avg episode reward: [(0, '23.446')] [2024-10-06 19:22:31,450][02281] Updated weights for policy 0, policy_version 910 (0.0024) [2024-10-06 19:22:35,212][00269] Fps is (10 sec: 3685.4, 60 sec: 3686.2, 300 sec: 3748.8). Total num frames: 3735552. Throughput: 0: 955.0. Samples: 933472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:22:35,215][00269] Avg episode reward: [(0, '23.064')] [2024-10-06 19:22:40,209][00269] Fps is (10 sec: 3276.9, 60 sec: 3754.8, 300 sec: 3762.8). Total num frames: 3756032. Throughput: 0: 922.9. Samples: 938628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:22:40,211][00269] Avg episode reward: [(0, '22.793')] [2024-10-06 19:22:42,422][02281] Updated weights for policy 0, policy_version 920 (0.0022) [2024-10-06 19:22:45,210][00269] Fps is (10 sec: 4506.8, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3780608. Throughput: 0: 979.3. Samples: 945706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:22:45,216][00269] Avg episode reward: [(0, '22.947')] [2024-10-06 19:22:50,213][00269] Fps is (10 sec: 4094.6, 60 sec: 3822.7, 300 sec: 3748.8). Total num frames: 3796992. Throughput: 0: 998.5. Samples: 948798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:22:50,216][00269] Avg episode reward: [(0, '23.389')] [2024-10-06 19:22:53,960][02281] Updated weights for policy 0, policy_version 930 (0.0019) [2024-10-06 19:22:55,210][00269] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3813376. Throughput: 0: 939.6. Samples: 953104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-06 19:22:55,214][00269] Avg episode reward: [(0, '22.598')] [2024-10-06 19:23:00,210][00269] Fps is (10 sec: 4097.2, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 3837952. Throughput: 0: 967.9. Samples: 960242. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:23:00,213][00269] Avg episode reward: [(0, '23.386')] [2024-10-06 19:23:02,648][02281] Updated weights for policy 0, policy_version 940 (0.0017) [2024-10-06 19:23:05,209][00269] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 3858432. Throughput: 0: 997.4. Samples: 963644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:23:05,213][00269] Avg episode reward: [(0, '24.970')] [2024-10-06 19:23:05,232][02268] Saving new best policy, reward=24.970! [2024-10-06 19:23:10,209][00269] Fps is (10 sec: 3277.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3870720. Throughput: 0: 959.4. Samples: 968064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-06 19:23:10,214][00269] Avg episode reward: [(0, '26.416')] [2024-10-06 19:23:10,219][02268] Saving new best policy, reward=26.416! [2024-10-06 19:23:14,470][02281] Updated weights for policy 0, policy_version 950 (0.0018) [2024-10-06 19:23:15,209][00269] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3891200. Throughput: 0: 944.4. Samples: 973900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:23:15,217][00269] Avg episode reward: [(0, '26.163')] [2024-10-06 19:23:20,209][00269] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 3915776. Throughput: 0: 975.3. Samples: 977360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-06 19:23:20,214][00269] Avg episode reward: [(0, '25.694')] [2024-10-06 19:23:25,093][02281] Updated weights for policy 0, policy_version 960 (0.0031) [2024-10-06 19:23:25,211][00269] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3748.9). Total num frames: 3932160. Throughput: 0: 982.3. Samples: 982834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:23:25,216][00269] Avg episode reward: [(0, '24.418')] [2024-10-06 19:23:25,225][02268] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000960_3932160.pth... [2024-10-06 19:23:25,388][02268] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000741_3035136.pth [2024-10-06 19:23:30,210][00269] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3948544. Throughput: 0: 936.6. Samples: 987854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-06 19:23:30,216][00269] Avg episode reward: [(0, '23.750')] [2024-10-06 19:23:35,210][00269] Fps is (10 sec: 3686.9, 60 sec: 3891.4, 300 sec: 3762.8). Total num frames: 3969024. Throughput: 0: 939.7. Samples: 991080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:23:35,212][00269] Avg episode reward: [(0, '22.327')] [2024-10-06 19:23:35,748][02281] Updated weights for policy 0, policy_version 970 (0.0024) [2024-10-06 19:23:40,211][00269] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 3989504. Throughput: 0: 978.6. Samples: 997142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:23:40,219][00269] Avg episode reward: [(0, '21.671')] [2024-10-06 19:23:45,211][00269] Fps is (10 sec: 3276.3, 60 sec: 3686.3, 300 sec: 3735.0). Total num frames: 4001792. Throughput: 0: 907.0. Samples: 1001060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-06 19:23:45,217][00269] Avg episode reward: [(0, '23.037')] [2024-10-06 19:23:46,065][00269] Component Batcher_0 stopped! [2024-10-06 19:23:46,065][02268] Stopping Batcher_0... [2024-10-06 19:23:46,072][02268] Loop batcher_evt_loop terminating... [2024-10-06 19:23:46,074][02268] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-06 19:23:46,129][02281] Weights refcount: 2 0 [2024-10-06 19:23:46,134][02281] Stopping InferenceWorker_p0-w0... [2024-10-06 19:23:46,133][00269] Component InferenceWorker_p0-w0 stopped! [2024-10-06 19:23:46,135][02281] Loop inference_proc0-0_evt_loop terminating... [2024-10-06 19:23:46,228][02268] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000849_3477504.pth [2024-10-06 19:23:46,238][02268] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-06 19:23:46,397][02268] Stopping LearnerWorker_p0... [2024-10-06 19:23:46,398][02268] Loop learner_proc0_evt_loop terminating... [2024-10-06 19:23:46,396][00269] Component LearnerWorker_p0 stopped! [2024-10-06 19:23:46,437][02284] Stopping RolloutWorker_w1... [2024-10-06 19:23:46,438][02284] Loop rollout_proc1_evt_loop terminating... [2024-10-06 19:23:46,437][00269] Component RolloutWorker_w1 stopped! [2024-10-06 19:23:46,457][02287] Stopping RolloutWorker_w5... [2024-10-06 19:23:46,457][00269] Component RolloutWorker_w5 stopped! [2024-10-06 19:23:46,458][02287] Loop rollout_proc5_evt_loop terminating... [2024-10-06 19:23:46,478][02289] Stopping RolloutWorker_w7... [2024-10-06 19:23:46,478][00269] Component RolloutWorker_w7 stopped! [2024-10-06 19:23:46,490][02289] Loop rollout_proc7_evt_loop terminating... [2024-10-06 19:23:46,506][00269] Component RolloutWorker_w2 stopped! [2024-10-06 19:23:46,511][00269] Component RolloutWorker_w3 stopped! [2024-10-06 19:23:46,510][02285] Stopping RolloutWorker_w3... [2024-10-06 19:23:46,514][02285] Loop rollout_proc3_evt_loop terminating... [2024-10-06 19:23:46,512][02283] Stopping RolloutWorker_w2... [2024-10-06 19:23:46,516][02283] Loop rollout_proc2_evt_loop terminating... [2024-10-06 19:23:46,574][02288] Stopping RolloutWorker_w6... [2024-10-06 19:23:46,574][00269] Component RolloutWorker_w6 stopped! [2024-10-06 19:23:46,593][02286] Stopping RolloutWorker_w4... [2024-10-06 19:23:46,576][02288] Loop rollout_proc6_evt_loop terminating... [2024-10-06 19:23:46,593][00269] Component RolloutWorker_w4 stopped! [2024-10-06 19:23:46,594][02286] Loop rollout_proc4_evt_loop terminating... [2024-10-06 19:23:46,631][02282] Stopping RolloutWorker_w0... [2024-10-06 19:23:46,631][00269] Component RolloutWorker_w0 stopped! [2024-10-06 19:23:46,633][00269] Waiting for process learner_proc0 to stop... [2024-10-06 19:23:46,633][02282] Loop rollout_proc0_evt_loop terminating... [2024-10-06 19:23:48,067][00269] Waiting for process inference_proc0-0 to join... [2024-10-06 19:23:48,072][00269] Waiting for process rollout_proc0 to join... [2024-10-06 19:23:50,204][00269] Waiting for process rollout_proc1 to join... [2024-10-06 19:23:50,207][00269] Waiting for process rollout_proc2 to join... [2024-10-06 19:23:50,212][00269] Waiting for process rollout_proc3 to join... [2024-10-06 19:23:50,216][00269] Waiting for process rollout_proc4 to join... [2024-10-06 19:23:50,220][00269] Waiting for process rollout_proc5 to join... [2024-10-06 19:23:50,223][00269] Waiting for process rollout_proc6 to join... [2024-10-06 19:23:50,227][00269] Waiting for process rollout_proc7 to join... [2024-10-06 19:23:50,232][00269] Batcher 0 profile tree view: batching: 26.6070, releasing_batches: 0.0286 [2024-10-06 19:23:50,233][00269] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0055 wait_policy_total: 421.9832 update_model: 8.9278 weight_update: 0.0038 one_step: 0.0039 handle_policy_step: 598.0507 deserialize: 15.2298, stack: 3.1460, obs_to_device_normalize: 120.5802, forward: 319.5194, send_messages: 29.1319 prepare_outputs: 81.4118 to_cpu: 46.1914 [2024-10-06 19:23:50,235][00269] Learner 0 profile tree view: misc: 0.0057, prepare_batch: 14.3020 train: 74.0747 epoch_init: 0.0081, minibatch_init: 0.0160, losses_postprocess: 0.7200, kl_divergence: 0.6127, after_optimizer: 34.0121 calculate_losses: 26.0508 losses_init: 0.0037, forward_head: 1.3212, bptt_initial: 17.5406, tail: 1.0569, advantages_returns: 0.2691, losses: 3.6856 bptt: 1.8128 bptt_forward_core: 1.7110 update: 11.9161 clip: 0.8889 [2024-10-06 19:23:50,238][00269] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2794, enqueue_policy_requests: 103.4943, env_step: 834.9213, overhead: 14.5222, complete_rollouts: 6.8642 save_policy_outputs: 20.9446 split_output_tensors: 8.4927 [2024-10-06 19:23:50,239][00269] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2967, enqueue_policy_requests: 105.2167, env_step: 832.6729, overhead: 14.6079, complete_rollouts: 7.8198 save_policy_outputs: 21.2198 split_output_tensors: 8.6159 [2024-10-06 19:23:50,241][00269] Loop Runner_EvtLoop terminating... [2024-10-06 19:23:50,242][00269] Runner profile tree view: main_loop: 1098.8816 [2024-10-06 19:23:50,246][00269] Collected {0: 4005888}, FPS: 3645.4 [2024-10-06 19:25:29,599][00269] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-06 19:25:29,601][00269] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-06 19:25:29,604][00269] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-06 19:25:29,607][00269] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-06 19:25:29,608][00269] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-06 19:25:29,611][00269] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-06 19:25:29,612][00269] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-10-06 19:25:29,615][00269] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-06 19:25:29,616][00269] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-10-06 19:25:29,618][00269] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-10-06 19:25:29,619][00269] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-06 19:25:29,620][00269] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-06 19:25:29,621][00269] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-06 19:25:29,622][00269] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-06 19:25:29,623][00269] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-06 19:25:29,676][00269] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-06 19:25:29,681][00269] RunningMeanStd input shape: (3, 72, 128) [2024-10-06 19:25:29,684][00269] RunningMeanStd input shape: (1,) [2024-10-06 19:25:29,705][00269] ConvEncoder: input_channels=3 [2024-10-06 19:25:29,833][00269] Conv encoder output size: 512 [2024-10-06 19:25:29,836][00269] Policy head output size: 512 [2024-10-06 19:25:30,020][00269] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-06 19:25:30,840][00269] Num frames 100... [2024-10-06 19:25:30,966][00269] Num frames 200... [2024-10-06 19:25:31,089][00269] Num frames 300... [2024-10-06 19:25:31,213][00269] Num frames 400... [2024-10-06 19:25:31,337][00269] Num frames 500... [2024-10-06 19:25:31,465][00269] Num frames 600... [2024-10-06 19:25:31,586][00269] Num frames 700... [2024-10-06 19:25:31,716][00269] Num frames 800... [2024-10-06 19:25:31,898][00269] Avg episode rewards: #0: 15.960, true rewards: #0: 8.960 [2024-10-06 19:25:31,900][00269] Avg episode reward: 15.960, avg true_objective: 8.960 [2024-10-06 19:25:31,909][00269] Num frames 900... [2024-10-06 19:25:32,033][00269] Num frames 1000... [2024-10-06 19:25:32,154][00269] Num frames 1100... [2024-10-06 19:25:32,277][00269] Num frames 1200... [2024-10-06 19:25:32,399][00269] Num frames 1300... [2024-10-06 19:25:32,519][00269] Num frames 1400... [2024-10-06 19:25:32,639][00269] Num frames 1500... [2024-10-06 19:25:32,765][00269] Num frames 1600... [2024-10-06 19:25:32,895][00269] Num frames 1700... [2024-10-06 19:25:32,985][00269] Avg episode rewards: #0: 16.640, true rewards: #0: 8.640 [2024-10-06 19:25:32,987][00269] Avg episode reward: 16.640, avg true_objective: 8.640 [2024-10-06 19:25:33,075][00269] Num frames 1800... [2024-10-06 19:25:33,196][00269] Num frames 1900... [2024-10-06 19:25:33,321][00269] Num frames 2000... [2024-10-06 19:25:33,444][00269] Num frames 2100... [2024-10-06 19:25:33,566][00269] Num frames 2200... [2024-10-06 19:25:33,699][00269] Num frames 2300... [2024-10-06 19:25:33,832][00269] Num frames 2400... [2024-10-06 19:25:33,954][00269] Num frames 2500... [2024-10-06 19:25:34,079][00269] Num frames 2600... [2024-10-06 19:25:34,201][00269] Num frames 2700... [2024-10-06 19:25:34,323][00269] Num frames 2800... [2024-10-06 19:25:34,446][00269] Num frames 2900... [2024-10-06 19:25:34,565][00269] Num frames 3000... [2024-10-06 19:25:34,686][00269] Num frames 3100... [2024-10-06 19:25:34,821][00269] Num frames 3200... [2024-10-06 19:25:34,960][00269] Num frames 3300... [2024-10-06 19:25:35,127][00269] Num frames 3400... [2024-10-06 19:25:35,254][00269] Num frames 3500... [2024-10-06 19:25:35,379][00269] Num frames 3600... [2024-10-06 19:25:35,501][00269] Num frames 3700... [2024-10-06 19:25:35,632][00269] Num frames 3800... [2024-10-06 19:25:35,722][00269] Avg episode rewards: #0: 30.093, true rewards: #0: 12.760 [2024-10-06 19:25:35,724][00269] Avg episode reward: 30.093, avg true_objective: 12.760 [2024-10-06 19:25:35,829][00269] Num frames 3900... [2024-10-06 19:25:35,949][00269] Num frames 4000... [2024-10-06 19:25:36,069][00269] Num frames 4100... [2024-10-06 19:25:36,189][00269] Num frames 4200... [2024-10-06 19:25:36,313][00269] Num frames 4300... [2024-10-06 19:25:36,429][00269] Num frames 4400... [2024-10-06 19:25:36,547][00269] Num frames 4500... [2024-10-06 19:25:36,665][00269] Num frames 4600... [2024-10-06 19:25:36,800][00269] Num frames 4700... [2024-10-06 19:25:36,920][00269] Num frames 4800... [2024-10-06 19:25:37,037][00269] Num frames 4900... [2024-10-06 19:25:37,160][00269] Num frames 5000... [2024-10-06 19:25:37,284][00269] Num frames 5100... [2024-10-06 19:25:37,405][00269] Num frames 5200... [2024-10-06 19:25:37,527][00269] Num frames 5300... [2024-10-06 19:25:37,673][00269] Avg episode rewards: #0: 32.935, true rewards: #0: 13.435 [2024-10-06 19:25:37,674][00269] Avg episode reward: 32.935, avg true_objective: 13.435 [2024-10-06 19:25:37,709][00269] Num frames 5400... [2024-10-06 19:25:37,842][00269] Num frames 5500... [2024-10-06 19:25:37,963][00269] Num frames 5600... [2024-10-06 19:25:38,089][00269] Num frames 5700... [2024-10-06 19:25:38,245][00269] Num frames 5800... [2024-10-06 19:25:38,415][00269] Num frames 5900... [2024-10-06 19:25:38,553][00269] Avg episode rewards: #0: 29.100, true rewards: #0: 11.900 [2024-10-06 19:25:38,555][00269] Avg episode reward: 29.100, avg true_objective: 11.900 [2024-10-06 19:25:38,640][00269] Num frames 6000... [2024-10-06 19:25:38,821][00269] Num frames 6100... [2024-10-06 19:25:38,997][00269] Num frames 6200... [2024-10-06 19:25:39,163][00269] Num frames 6300... [2024-10-06 19:25:39,326][00269] Num frames 6400... [2024-10-06 19:25:39,496][00269] Num frames 6500... [2024-10-06 19:25:39,584][00269] Avg episode rewards: #0: 26.027, true rewards: #0: 10.860 [2024-10-06 19:25:39,586][00269] Avg episode reward: 26.027, avg true_objective: 10.860 [2024-10-06 19:25:39,727][00269] Num frames 6600... [2024-10-06 19:25:39,921][00269] Num frames 6700... [2024-10-06 19:25:40,090][00269] Num frames 6800... [2024-10-06 19:25:40,274][00269] Num frames 6900... [2024-10-06 19:25:40,441][00269] Avg episode rewards: #0: 23.091, true rewards: #0: 9.949 [2024-10-06 19:25:40,443][00269] Avg episode reward: 23.091, avg true_objective: 9.949 [2024-10-06 19:25:40,496][00269] Num frames 7000... [2024-10-06 19:25:40,623][00269] Num frames 7100... [2024-10-06 19:25:40,747][00269] Num frames 7200... [2024-10-06 19:25:40,879][00269] Num frames 7300... [2024-10-06 19:25:41,010][00269] Num frames 7400... [2024-10-06 19:25:41,131][00269] Num frames 7500... [2024-10-06 19:25:41,260][00269] Num frames 7600... [2024-10-06 19:25:41,381][00269] Num frames 7700... [2024-10-06 19:25:41,554][00269] Avg episode rewards: #0: 22.620, true rewards: #0: 9.745 [2024-10-06 19:25:41,556][00269] Avg episode reward: 22.620, avg true_objective: 9.745 [2024-10-06 19:25:41,565][00269] Num frames 7800... [2024-10-06 19:25:41,685][00269] Num frames 7900... [2024-10-06 19:25:41,817][00269] Num frames 8000... [2024-10-06 19:25:41,946][00269] Num frames 8100... [2024-10-06 19:25:42,071][00269] Num frames 8200... [2024-10-06 19:25:42,196][00269] Num frames 8300... [2024-10-06 19:25:42,315][00269] Num frames 8400... [2024-10-06 19:25:42,438][00269] Num frames 8500... [2024-10-06 19:25:42,557][00269] Num frames 8600... [2024-10-06 19:25:42,683][00269] Avg episode rewards: #0: 22.400, true rewards: #0: 9.622 [2024-10-06 19:25:42,685][00269] Avg episode reward: 22.400, avg true_objective: 9.622 [2024-10-06 19:25:42,737][00269] Num frames 8700... [2024-10-06 19:25:42,866][00269] Num frames 8800... [2024-10-06 19:25:42,993][00269] Num frames 8900... [2024-10-06 19:25:43,114][00269] Num frames 9000... [2024-10-06 19:25:43,238][00269] Num frames 9100... [2024-10-06 19:25:43,356][00269] Num frames 9200... [2024-10-06 19:25:43,474][00269] Num frames 9300... [2024-10-06 19:25:43,592][00269] Num frames 9400... [2024-10-06 19:25:43,718][00269] Num frames 9500... [2024-10-06 19:25:43,844][00269] Num frames 9600... [2024-10-06 19:25:43,961][00269] Num frames 9700... [2024-10-06 19:25:44,098][00269] Num frames 9800... [2024-10-06 19:25:44,228][00269] Num frames 9900... [2024-10-06 19:25:44,357][00269] Num frames 10000... [2024-10-06 19:25:44,477][00269] Num frames 10100... [2024-10-06 19:25:44,599][00269] Num frames 10200... [2024-10-06 19:25:44,722][00269] Num frames 10300... [2024-10-06 19:25:44,854][00269] Num frames 10400... [2024-10-06 19:25:44,955][00269] Avg episode rewards: #0: 24.736, true rewards: #0: 10.436 [2024-10-06 19:25:44,957][00269] Avg episode reward: 24.736, avg true_objective: 10.436 [2024-10-06 19:26:48,122][00269] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-10-06 19:28:58,346][00269] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-06 19:28:58,348][00269] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-06 19:28:58,350][00269] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-06 19:28:58,352][00269] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-06 19:28:58,354][00269] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-06 19:28:58,356][00269] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-06 19:28:58,357][00269] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-10-06 19:28:58,358][00269] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-06 19:28:58,359][00269] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-10-06 19:28:58,360][00269] Adding new argument 'hf_repository'='grib0ed0v/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-10-06 19:28:58,362][00269] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-06 19:28:58,363][00269] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-06 19:28:58,364][00269] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-06 19:28:58,365][00269] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-06 19:28:58,366][00269] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-06 19:28:58,396][00269] RunningMeanStd input shape: (3, 72, 128) [2024-10-06 19:28:58,398][00269] RunningMeanStd input shape: (1,) [2024-10-06 19:28:58,415][00269] ConvEncoder: input_channels=3 [2024-10-06 19:28:58,453][00269] Conv encoder output size: 512 [2024-10-06 19:28:58,456][00269] Policy head output size: 512 [2024-10-06 19:28:58,474][00269] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-06 19:28:58,913][00269] Num frames 100... [2024-10-06 19:28:59,032][00269] Num frames 200... [2024-10-06 19:28:59,151][00269] Num frames 300... [2024-10-06 19:28:59,270][00269] Num frames 400... [2024-10-06 19:28:59,400][00269] Num frames 500... [2024-10-06 19:28:59,526][00269] Num frames 600... [2024-10-06 19:28:59,654][00269] Num frames 700... [2024-10-06 19:28:59,782][00269] Num frames 800... [2024-10-06 19:28:59,904][00269] Num frames 900... [2024-10-06 19:29:00,079][00269] Avg episode rewards: #0: 25.990, true rewards: #0: 9.990 [2024-10-06 19:29:00,081][00269] Avg episode reward: 25.990, avg true_objective: 9.990 [2024-10-06 19:29:00,086][00269] Num frames 1000... [2024-10-06 19:29:00,204][00269] Num frames 1100... [2024-10-06 19:29:00,327][00269] Num frames 1200... [2024-10-06 19:29:00,444][00269] Num frames 1300... [2024-10-06 19:29:00,559][00269] Num frames 1400... [2024-10-06 19:29:00,688][00269] Num frames 1500... [2024-10-06 19:29:00,811][00269] Num frames 1600... [2024-10-06 19:29:00,931][00269] Num frames 1700... [2024-10-06 19:29:01,050][00269] Num frames 1800... [2024-10-06 19:29:01,168][00269] Num frames 1900... [2024-10-06 19:29:01,294][00269] Num frames 2000... [2024-10-06 19:29:01,416][00269] Num frames 2100... [2024-10-06 19:29:01,544][00269] Num frames 2200... [2024-10-06 19:29:01,664][00269] Num frames 2300... [2024-10-06 19:29:01,799][00269] Num frames 2400... [2024-10-06 19:29:01,921][00269] Num frames 2500... [2024-10-06 19:29:02,031][00269] Avg episode rewards: #0: 32.220, true rewards: #0: 12.720 [2024-10-06 19:29:02,034][00269] Avg episode reward: 32.220, avg true_objective: 12.720 [2024-10-06 19:29:02,132][00269] Num frames 2600... [2024-10-06 19:29:02,297][00269] Num frames 2700... [2024-10-06 19:29:02,457][00269] Num frames 2800... [2024-10-06 19:29:02,615][00269] Num frames 2900... [2024-10-06 19:29:02,787][00269] Num frames 3000... [2024-10-06 19:29:02,951][00269] Num frames 3100... [2024-10-06 19:29:03,121][00269] Num frames 3200... [2024-10-06 19:29:03,286][00269] Num frames 3300... [2024-10-06 19:29:03,426][00269] Avg episode rewards: #0: 26.493, true rewards: #0: 11.160 [2024-10-06 19:29:03,429][00269] Avg episode reward: 26.493, avg true_objective: 11.160 [2024-10-06 19:29:03,519][00269] Num frames 3400... [2024-10-06 19:29:03,692][00269] Num frames 3500... [2024-10-06 19:29:03,881][00269] Num frames 3600... [2024-10-06 19:29:04,065][00269] Num frames 3700... [2024-10-06 19:29:04,237][00269] Num frames 3800... [2024-10-06 19:29:04,409][00269] Num frames 3900... [2024-10-06 19:29:04,534][00269] Num frames 4000... [2024-10-06 19:29:04,660][00269] Num frames 4100... [2024-10-06 19:29:04,819][00269] Avg episode rewards: #0: 24.715, true rewards: #0: 10.465 [2024-10-06 19:29:04,822][00269] Avg episode reward: 24.715, avg true_objective: 10.465 [2024-10-06 19:29:04,841][00269] Num frames 4200... [2024-10-06 19:29:04,958][00269] Num frames 4300... [2024-10-06 19:29:05,079][00269] Num frames 4400... [2024-10-06 19:29:05,200][00269] Num frames 4500... [2024-10-06 19:29:05,322][00269] Num frames 4600... [2024-10-06 19:29:05,460][00269] Avg episode rewards: #0: 22.340, true rewards: #0: 9.340 [2024-10-06 19:29:05,462][00269] Avg episode reward: 22.340, avg true_objective: 9.340 [2024-10-06 19:29:05,500][00269] Num frames 4700... [2024-10-06 19:29:05,618][00269] Num frames 4800... [2024-10-06 19:29:05,737][00269] Num frames 4900... [2024-10-06 19:29:05,876][00269] Num frames 5000... [2024-10-06 19:29:05,964][00269] Avg episode rewards: #0: 19.380, true rewards: #0: 8.380 [2024-10-06 19:29:05,967][00269] Avg episode reward: 19.380, avg true_objective: 8.380 [2024-10-06 19:29:06,056][00269] Num frames 5100... [2024-10-06 19:29:06,176][00269] Num frames 5200... [2024-10-06 19:29:06,298][00269] Num frames 5300... [2024-10-06 19:29:06,417][00269] Num frames 5400... [2024-10-06 19:29:06,567][00269] Num frames 5500... [2024-10-06 19:29:06,690][00269] Num frames 5600... [2024-10-06 19:29:06,823][00269] Num frames 5700... [2024-10-06 19:29:06,907][00269] Avg episode rewards: #0: 18.171, true rewards: #0: 8.171 [2024-10-06 19:29:06,909][00269] Avg episode reward: 18.171, avg true_objective: 8.171 [2024-10-06 19:29:07,006][00269] Num frames 5800... [2024-10-06 19:29:07,125][00269] Num frames 5900... [2024-10-06 19:29:07,247][00269] Num frames 6000... [2024-10-06 19:29:07,365][00269] Num frames 6100... [2024-10-06 19:29:07,483][00269] Num frames 6200... [2024-10-06 19:29:07,608][00269] Num frames 6300... [2024-10-06 19:29:07,728][00269] Num frames 6400... [2024-10-06 19:29:07,858][00269] Num frames 6500... [2024-10-06 19:29:07,986][00269] Num frames 6600... [2024-10-06 19:29:08,108][00269] Num frames 6700... [2024-10-06 19:29:08,231][00269] Num frames 6800... [2024-10-06 19:29:08,352][00269] Num frames 6900... [2024-10-06 19:29:08,475][00269] Num frames 7000... [2024-10-06 19:29:08,596][00269] Avg episode rewards: #0: 19.691, true rewards: #0: 8.816 [2024-10-06 19:29:08,598][00269] Avg episode reward: 19.691, avg true_objective: 8.816 [2024-10-06 19:29:08,656][00269] Num frames 7100... [2024-10-06 19:29:08,778][00269] Num frames 7200... [2024-10-06 19:29:08,916][00269] Num frames 7300... [2024-10-06 19:29:09,038][00269] Num frames 7400... [2024-10-06 19:29:09,159][00269] Num frames 7500... [2024-10-06 19:29:09,282][00269] Num frames 7600... [2024-10-06 19:29:09,400][00269] Num frames 7700... [2024-10-06 19:29:09,518][00269] Num frames 7800... [2024-10-06 19:29:09,638][00269] Num frames 7900... [2024-10-06 19:29:09,755][00269] Num frames 8000... [2024-10-06 19:29:09,881][00269] Num frames 8100... [2024-10-06 19:29:10,011][00269] Num frames 8200... [2024-10-06 19:29:10,131][00269] Num frames 8300... [2024-10-06 19:29:10,255][00269] Num frames 8400... [2024-10-06 19:29:10,430][00269] Avg episode rewards: #0: 21.103, true rewards: #0: 9.437 [2024-10-06 19:29:10,432][00269] Avg episode reward: 21.103, avg true_objective: 9.437 [2024-10-06 19:29:10,446][00269] Num frames 8500... [2024-10-06 19:29:10,589][00269] Num frames 8600... [2024-10-06 19:29:10,731][00269] Num frames 8700... [2024-10-06 19:29:10,864][00269] Num frames 8800... [2024-10-06 19:29:10,992][00269] Num frames 8900... [2024-10-06 19:29:11,115][00269] Num frames 9000... [2024-10-06 19:29:11,236][00269] Num frames 9100... [2024-10-06 19:29:11,357][00269] Num frames 9200... [2024-10-06 19:29:11,478][00269] Num frames 9300... [2024-10-06 19:29:11,600][00269] Num frames 9400... [2024-10-06 19:29:11,726][00269] Avg episode rewards: #0: 21.453, true rewards: #0: 9.453 [2024-10-06 19:29:11,728][00269] Avg episode reward: 21.453, avg true_objective: 9.453 [2024-10-06 19:30:09,018][00269] Replay video saved to /content/train_dir/default_experiment/replay.mp4!