2021-07-26 00:12:35.575266: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2021-07-26 00:12:35.575304: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303) [00:12:36] - INFO - filelock - Lock 139656499698272 acquired on /home/versae/.cache/huggingface/transformers/27b7e968d2908b27f8c1df265c2dc08aef61be0f25bdc735df4df552829968fd.04a8293889c44bb7f31a5ee6212b8aa0b690121444e9c7ce1616fbe2a461ebba.lock Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 250M/250M [00:06<00:00, 35.8MB/s] [00:12:43] - INFO - filelock - Lock 139656499698272 released on /home/versae/.cache/huggingface/transformers/27b7e968d2908b27f8c1df265c2dc08aef61be0f25bdc735df4df552829968fd.04a8293889c44bb7f31a5ee6212b8aa0b690121444e9c7ce1616fbe2a461ebba.lock /var/hf/venv/lib/python3.8/site-packages/jax/lib/xla_bridge.py:386: UserWarning: jax.host_count has been renamed to jax.process_count. This alias will eventually be removed; please update your code. warnings.warn( /var/hf/venv/lib/python3.8/site-packages/jax/lib/xla_bridge.py:373: UserWarning: jax.host_id has been renamed to jax.process_index. This alias will eventually be removed; please update your code. warnings.warn( Training...: 2%|█▊ | 1000/50000 [22:19<17:30:45, 1.29s/it] Step... (500 | Loss: 1.8920137882232666, Learning Rate: 0.0006000000284984708) Training...: 2%|█▊ | 1000/50000 [22:21<17:30:45, 1.29s/it] [02:30:54] - INFO - __main__ - Saving checkpoint at 1000 steps██████████████████████████████████████████████████████| 130/130 [00:31<00:00, 4.59it/s] /var/hf/transformers-orig/src/transformers/modeling_flax_pytorch_utils.py:201: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:180.) pt_model_dict[flax_key] = torch.from_numpy(flax_tensor) All Flax model weights were used when initializing RobertaForMaskedLM. Some weights of RobertaForMaskedLM were not initialized from the Flax model and are newly initialized: ['lm_head.decoder.weight', 'roberta.embeddings.position_ids', 'lm_head.decoder.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Step... (1000/50000 | Loss: 1.7686773538589478, Acc: 0.6487793326377869): 4%|█▏ | 2000/50000 [45:36<16:04:15, 1.21s/it] Step... (1500 | Loss: 1.8557080030441284, Learning Rate: 0.0005878788069821894) Step... (1000/50000 | Loss: 1.7686773538589478, Acc: 0.6487793326377869): 4%|█▏ | 2000/50000 [45:38<16:04:15, 1.21s/it] [02:54:02] - INFO - __main__ - Saving checkpoint at 2000 steps██████████████████████████████████████████████████████| 130/130 [00:21<00:00, 4.59it/s] All Flax model weights were used when initializing RobertaForMaskedLM. Some weights of RobertaForMaskedLM were not initialized from the Flax model and are newly initialized: ['lm_head.decoder.weight', 'roberta.embeddings.position_ids', 'lm_head.decoder.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Step... (2000/50000 | Loss: 1.778090238571167, Acc: 0.6472830772399902): 6%|█▊ | 3000/50000 [1:08:36<16:30:25, 1.26s/it] Evaluating ...: 5%|████▍ | 6/130 [00:00<00:07, 16.24it/s] Step... (2500 | Loss: 1.9601893424987793, Learning Rate: 0.000575757585465908) [03:16:59] - INFO - __main__ - Saving checkpoint at 3000 steps██████████████████████████████████████████████████████| 130/130 [00:21<00:00, 4.60it/s] All Flax model weights were used when initializing RobertaForMaskedLM. Some weights of RobertaForMaskedLM were not initialized from the Flax model and are newly initialized: ['lm_head.decoder.weight', 'roberta.embeddings.position_ids', 'lm_head.decoder.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Step... (3000/50000 | Loss: 1.7852987051010132, Acc: 0.6470173597335815): 8%|██▎ | 4000/50000 [1:31:22<16:35:30, 1.30s/it] Step... (3500 | Loss: 1.8832361698150635, Learning Rate: 0.0005636363639496267) Step... (3000/50000 | Loss: 1.7852987051010132, Acc: 0.6470173597335815): 8%|██▎ | 4000/50000 [1:31:24<16:35:30, 1.30s/it] [03:39:47] - INFO - __main__ - Saving checkpoint at 4000 steps██████████████████████████████████████████████████████| 130/130 [00:21<00:00, 4.60it/s] All Flax model weights were used when initializing RobertaForMaskedLM. Some weights of RobertaForMaskedLM were not initialized from the Flax model and are newly initialized: ['lm_head.decoder.weight', 'roberta.embeddings.position_ids', 'lm_head.decoder.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Step... (4000/50000 | Loss: 1.776147484779358, Acc: 0.6480115652084351): 10%|███ | 5000/50000 [1:54:07<16:53:11, 1.35s/it] Evaluating ...: 11%|██████████▎ | 14/130 [00:00<00:07, 15.22it/s] Step... (4500 | Loss: 1.8291735649108887, Learning Rate: 0.0005515151424333453) [04:02:30] - INFO - __main__ - Saving checkpoint at 5000 steps██████████████████████████████████████████████████████| 130/130 [00:21<00:00, 4.60it/s] All Flax model weights were used when initializing RobertaForMaskedLM. Some weights of RobertaForMaskedLM were not initialized from the Flax model and are newly initialized: ['lm_head.decoder.weight', 'roberta.embeddings.position_ids', 'lm_head.decoder.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Step... (5000/50000 | Loss: 1.7797870635986328, Acc: 0.647495448589325): 12%|███▌ | 6000/50000 [2:17:21<17:46:48, 1.45s/it] Step... (5500 | Loss: 1.9027880430221558, Learning Rate: 0.0005393939791247249) Step... (5000/50000 | Loss: 1.7797870635986328, Acc: 0.647495448589325): 12%|███▌ | 6000/50000 [2:17:23<17:46:48, 1.45s/it] [04:25:46] - INFO - __main__ - Saving checkpoint at 6000 steps██████████████████████████████████████████████████████| 130/130 [00:21<00:00, 4.60it/s] All Flax model weights were used when initializing RobertaForMaskedLM. Some weights of RobertaForMaskedLM were not initialized from the Flax model and are newly initialized: ['lm_head.decoder.weight', 'roberta.embeddings.position_ids', 'lm_head.decoder.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Step... (6000/50000 | Loss: 1.7780379056930542, Acc: 0.6486639976501465): 14%|████ | 7000/50000 [2:40:57<15:48:42, 1.32s/it] Evaluating ...: 0%| | 0/130 [00:00