distilbert/distilbert-base-uncased-finetuned-sst-2-english · Cannot reproduce the accuracy result.

Hi, experts.
I am new to Huggingface. I am trying to reproduce the fine-tuning result, but I cannot achieve the indicated accuracy.
I am using run_glue.py under transformers/examples/pytorch/text-classification to do the finetuning. Specifically, I am passing the following json.
The hyperparams are from here: https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english

{
       "model_name_or_path": "distilbert-base-uncased",
       "task_name": "sst2",
       "do_train": true,
       "do_eval": true,
       "max_seq_length": 128,
       "per_device_train_batch_size": 32,
       "learning_rate": 1e-5,
       "num_train_epochs": 3,
       "warmup_steps": 600,
       "output_dir": "/scratch/sst2_checkpoints"
}

However, the final accuracy I get after training is 89.68%. It is not bad, but it is lower than 91.3% that is indicated here: https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english. Not sure what I am doing wrong. Can someone help me understand why my accuracy is not reaching 91.3%?

Also, at the same webpage, on the right side, it says the accuracy with glue is 91.1% and the accuracy with sst2 is 98.9%, which I am not sure what it means (I thought sst2 was part of the glue dataset). What are these numbers and why are they still different than 91.3%?

Any help would be really appreciated.
Thank you.