anezatra commited on Apr 18

Commit

4752c45

•

1 Parent(s): 6f32aed

Upload folder using huggingface_hub

Browse files

Files changed (33) hide show

README.md +204 -0
adapter_config.json +27 -0
adapter_model.safetensors +3 -0
checkpoint-200/README.md +204 -0
checkpoint-200/adapter_config.json +27 -0
checkpoint-200/adapter_model.safetensors +3 -0
checkpoint-200/optimizer.pt +3 -0
checkpoint-200/rng_state.pth +3 -0
checkpoint-200/scheduler.pt +3 -0
checkpoint-200/trainer_state.json +169 -0
checkpoint-200/training_args.bin +3 -0
checkpoint-400/README.md +204 -0
checkpoint-400/adapter_config.json +27 -0
checkpoint-400/adapter_model.safetensors +3 -0
checkpoint-400/optimizer.pt +3 -0
checkpoint-400/rng_state.pth +3 -0
checkpoint-400/scheduler.pt +3 -0
checkpoint-400/trainer_state.json +317 -0
checkpoint-400/training_args.bin +3 -0
checkpoint-600/README.md +204 -0
checkpoint-600/adapter_config.json +27 -0
checkpoint-600/adapter_model.safetensors +3 -0
checkpoint-600/optimizer.pt +3 -0
checkpoint-600/rng_state.pth +3 -0
checkpoint-600/scheduler.pt +3 -0
checkpoint-600/trainer_state.json +465 -0
checkpoint-600/training_args.bin +3 -0
merges.txt +0 -0
special_tokens_map.json +24 -0
tokenizer.json +0 -0
tokenizer_config.json +22 -0
training_args.bin +3 -0
vocab.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: cerebras/Cerebras-GPT-2.7B
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.9.0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "cerebras/Cerebras-GPT-2.7B",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_attn"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d46fd5345c9e67b7731b338f70b4e5ab9254601ac05229c273420a983777a015
+size 10493960

checkpoint-200/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: cerebras/Cerebras-GPT-2.7B
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.9.0

checkpoint-200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "cerebras/Cerebras-GPT-2.7B",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_attn"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-200/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f037f89303aade6338800576a44cbd08fae793e06371272695595f9d1ded7fa3
+size 10493960

checkpoint-200/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:99c81f4034459886e4c84b44bf227845ce2febbacf3ea07333bb75b841e94ba2
+size 21025594

checkpoint-200/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ebedcbc5db8f6b98602bd526bb2b868a228710387ef7eed2035b0e2fccc861d2
+size 14244

checkpoint-200/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f9a19c22e0aa9795b757448bfc919a5c8f4c422e226f842b6b85ec64dd7e343c
+size 1064

checkpoint-200/trainer_state.json ADDED Viewed

	@@ -0,0 +1,169 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.6187161639597835,
+  "eval_steps": 200,
+  "global_step": 200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.5968947410583496,
+      "learning_rate": 9.896800825593395e-05,
+      "loss": 2.3935,
+      "step": 10
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 0.6443502306938171,
+      "learning_rate": 9.793601651186791e-05,
+      "loss": 2.1763,
+      "step": 20
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.4056578576564789,
+      "learning_rate": 9.690402476780186e-05,
+      "loss": 1.9601,
+      "step": 30
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 0.24697279930114746,
+      "learning_rate": 9.587203302373582e-05,
+      "loss": 1.9009,
+      "step": 40
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 0.24417832493782043,
+      "learning_rate": 9.484004127966977e-05,
+      "loss": 1.8698,
+      "step": 50
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 0.19971579313278198,
+      "learning_rate": 9.380804953560372e-05,
+      "loss": 1.8502,
+      "step": 60
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 0.21794500946998596,
+      "learning_rate": 9.277605779153768e-05,
+      "loss": 1.8139,
+      "step": 70
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 0.20081552863121033,
+      "learning_rate": 9.174406604747162e-05,
+      "loss": 1.8246,
+      "step": 80
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 0.2045777291059494,
+      "learning_rate": 9.071207430340559e-05,
+      "loss": 1.8009,
+      "step": 90
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 0.21513579785823822,
+      "learning_rate": 8.968008255933953e-05,
+      "loss": 1.7745,
+      "step": 100
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 0.2838321924209595,
+      "learning_rate": 8.864809081527348e-05,
+      "loss": 1.7831,
+      "step": 110
+    },
+    {
+      "epoch": 0.37,
+      "grad_norm": 0.19812746345996857,
+      "learning_rate": 8.761609907120744e-05,
+      "loss": 1.7554,
+      "step": 120
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 0.30143260955810547,
+      "learning_rate": 8.658410732714138e-05,
+      "loss": 1.7707,
+      "step": 130
+    },
+    {
+      "epoch": 0.43,
+      "grad_norm": 0.21341949701309204,
+      "learning_rate": 8.555211558307535e-05,
+      "loss": 1.7476,
+      "step": 140
+    },
+    {
+      "epoch": 0.46,
+      "grad_norm": 0.24006041884422302,
+      "learning_rate": 8.452012383900929e-05,
+      "loss": 1.7653,
+      "step": 150
+    },
+    {
+      "epoch": 0.49,
+      "grad_norm": 0.25095027685165405,
+      "learning_rate": 8.348813209494324e-05,
+      "loss": 1.7438,
+      "step": 160
+    },
+    {
+      "epoch": 0.53,
+      "grad_norm": 0.2602318525314331,
+      "learning_rate": 8.24561403508772e-05,
+      "loss": 1.7728,
+      "step": 170
+    },
+    {
+      "epoch": 0.56,
+      "grad_norm": 0.26738253235816956,
+      "learning_rate": 8.142414860681114e-05,
+      "loss": 1.7433,
+      "step": 180
+    },
+    {
+      "epoch": 0.59,
+      "grad_norm": 0.25230053067207336,
+      "learning_rate": 8.039215686274511e-05,
+      "loss": 1.7304,
+      "step": 190
+    },
+    {
+      "epoch": 0.62,
+      "grad_norm": 0.2360549122095108,
+      "learning_rate": 7.936016511867905e-05,
+      "loss": 1.7297,
+      "step": 200
+    },
+    {
+      "epoch": 0.62,
+      "eval_loss": 1.744746446609497,
+      "eval_runtime": 294.0669,
+      "eval_samples_per_second": 35.172,
+      "eval_steps_per_second": 4.397,
+      "step": 200
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 969,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 200,
+  "total_flos": 8.880783974203392e+16,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-200/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:54e8ba036bac7650d5b90e1b646acd2671a3ceddc07c04cd916a559021925335
+size 4920

checkpoint-400/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: cerebras/Cerebras-GPT-2.7B
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.9.0

checkpoint-400/adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "cerebras/Cerebras-GPT-2.7B",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_attn"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-400/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:31641c277aa64510c759c00b315604d2330242b9fc4db83769ab81bac39f47f8
+size 10493960

checkpoint-400/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:11ad17467719fd1e5ac9347db6dbbf1d0a0fdac89fa332b9fbf25f2872d52d90
+size 21025594

checkpoint-400/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cecbcc950d89153fea7521c9f8c48ce5ffe218a5f340197e50aae03650285f71
+size 14244

checkpoint-400/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2867435bc6d882c6b6965fe010819ce417dcda7314e5401902ae972573a42136
+size 1064

checkpoint-400/trainer_state.json ADDED Viewed

	@@ -0,0 +1,317 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.237432327919567,
+  "eval_steps": 200,
+  "global_step": 400,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.5968947410583496,
+      "learning_rate": 9.896800825593395e-05,
+      "loss": 2.3935,
+      "step": 10
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 0.6443502306938171,
+      "learning_rate": 9.793601651186791e-05,
+      "loss": 2.1763,
+      "step": 20
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.4056578576564789,
+      "learning_rate": 9.690402476780186e-05,
+      "loss": 1.9601,
+      "step": 30
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 0.24697279930114746,
+      "learning_rate": 9.587203302373582e-05,
+      "loss": 1.9009,
+      "step": 40
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 0.24417832493782043,
+      "learning_rate": 9.484004127966977e-05,
+      "loss": 1.8698,
+      "step": 50
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 0.19971579313278198,
+      "learning_rate": 9.380804953560372e-05,
+      "loss": 1.8502,
+      "step": 60
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 0.21794500946998596,
+      "learning_rate": 9.277605779153768e-05,
+      "loss": 1.8139,
+      "step": 70
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 0.20081552863121033,
+      "learning_rate": 9.174406604747162e-05,
+      "loss": 1.8246,
+      "step": 80
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 0.2045777291059494,
+      "learning_rate": 9.071207430340559e-05,
+      "loss": 1.8009,
+      "step": 90
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 0.21513579785823822,
+      "learning_rate": 8.968008255933953e-05,
+      "loss": 1.7745,
+      "step": 100
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 0.2838321924209595,
+      "learning_rate": 8.864809081527348e-05,
+      "loss": 1.7831,
+      "step": 110
+    },
+    {
+      "epoch": 0.37,
+      "grad_norm": 0.19812746345996857,
+      "learning_rate": 8.761609907120744e-05,
+      "loss": 1.7554,
+      "step": 120
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 0.30143260955810547,
+      "learning_rate": 8.658410732714138e-05,
+      "loss": 1.7707,
+      "step": 130
+    },
+    {
+      "epoch": 0.43,
+      "grad_norm": 0.21341949701309204,
+      "learning_rate": 8.555211558307535e-05,
+      "loss": 1.7476,
+      "step": 140
+    },
+    {
+      "epoch": 0.46,
+      "grad_norm": 0.24006041884422302,
+      "learning_rate": 8.452012383900929e-05,
+      "loss": 1.7653,
+      "step": 150
+    },
+    {
+      "epoch": 0.49,
+      "grad_norm": 0.25095027685165405,
+      "learning_rate": 8.348813209494324e-05,
+      "loss": 1.7438,
+      "step": 160
+    },
+    {
+      "epoch": 0.53,
+      "grad_norm": 0.2602318525314331,
+      "learning_rate": 8.24561403508772e-05,
+      "loss": 1.7728,
+      "step": 170
+    },
+    {
+      "epoch": 0.56,
+      "grad_norm": 0.26738253235816956,
+      "learning_rate": 8.142414860681114e-05,
+      "loss": 1.7433,
+      "step": 180
+    },
+    {
+      "epoch": 0.59,
+      "grad_norm": 0.25230053067207336,
+      "learning_rate": 8.039215686274511e-05,
+      "loss": 1.7304,
+      "step": 190
+    },
+    {
+      "epoch": 0.62,
+      "grad_norm": 0.2360549122095108,
+      "learning_rate": 7.936016511867905e-05,
+      "loss": 1.7297,
+      "step": 200
+    },
+    {
+      "epoch": 0.62,
+      "eval_loss": 1.744746446609497,
+      "eval_runtime": 294.0669,
+      "eval_samples_per_second": 35.172,
+      "eval_steps_per_second": 4.397,
+      "step": 200
+    },
+    {
+      "epoch": 0.65,
+      "grad_norm": 0.2905316948890686,
+      "learning_rate": 6.74922600619195e-05,
+      "loss": 1.727,
+      "step": 210
+    },
+    {
+      "epoch": 0.68,
+      "grad_norm": 0.25649935007095337,
+      "learning_rate": 6.594427244582044e-05,
+      "loss": 1.7224,
+      "step": 220
+    },
+    {
+      "epoch": 0.71,
+      "grad_norm": 0.23987528681755066,
+      "learning_rate": 6.439628482972137e-05,
+      "loss": 1.7389,
+      "step": 230
+    },
+    {
+      "epoch": 0.74,
+      "grad_norm": 0.2479698807001114,
+      "learning_rate": 6.28482972136223e-05,
+      "loss": 1.7255,
+      "step": 240
+    },
+    {
+      "epoch": 0.77,
+      "grad_norm": 0.25272852182388306,
+      "learning_rate": 6.130030959752322e-05,
+      "loss": 1.7354,
+      "step": 250
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 0.2447136789560318,
+      "learning_rate": 5.9752321981424155e-05,
+      "loss": 1.7443,
+      "step": 260
+    },
+    {
+      "epoch": 0.84,
+      "grad_norm": 0.26579201221466064,
+      "learning_rate": 5.8204334365325074e-05,
+      "loss": 1.7462,
+      "step": 270
+    },
+    {
+      "epoch": 0.87,
+      "grad_norm": 0.31005680561065674,
+      "learning_rate": 5.6656346749226006e-05,
+      "loss": 1.7144,
+      "step": 280
+    },
+    {
+      "epoch": 0.9,
+      "grad_norm": 0.2663085460662842,
+      "learning_rate": 5.510835913312694e-05,
+      "loss": 1.7094,
+      "step": 290
+    },
+    {
+      "epoch": 0.93,
+      "grad_norm": 0.28601768612861633,
+      "learning_rate": 5.3560371517027864e-05,
+      "loss": 1.6838,
+      "step": 300
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 0.2900325059890747,
+      "learning_rate": 5.20123839009288e-05,
+      "loss": 1.7042,
+      "step": 310
+    },
+    {
+      "epoch": 0.99,
+      "grad_norm": 0.28358617424964905,
+      "learning_rate": 5.046439628482973e-05,
+      "loss": 1.7057,
+      "step": 320
+    },
+    {
+      "epoch": 1.02,
+      "grad_norm": 0.3409838378429413,
+      "learning_rate": 4.891640866873065e-05,
+      "loss": 1.7228,
+      "step": 330
+    },
+    {
+      "epoch": 1.05,
+      "grad_norm": 0.28400272130966187,
+      "learning_rate": 4.736842105263158e-05,
+      "loss": 1.7157,
+      "step": 340
+    },
+    {
+      "epoch": 1.08,
+      "grad_norm": 0.33671486377716064,
+      "learning_rate": 4.582043343653251e-05,
+      "loss": 1.709,
+      "step": 350
+    },
+    {
+      "epoch": 1.11,
+      "grad_norm": 0.29089200496673584,
+      "learning_rate": 4.427244582043344e-05,
+      "loss": 1.7018,
+      "step": 360
+    },
+    {
+      "epoch": 1.14,
+      "grad_norm": 0.2736106514930725,
+      "learning_rate": 4.2724458204334365e-05,
+      "loss": 1.6888,
+      "step": 370
+    },
+    {
+      "epoch": 1.18,
+      "grad_norm": 0.272792786359787,
+      "learning_rate": 4.11764705882353e-05,
+      "loss": 1.7191,
+      "step": 380
+    },
+    {
+      "epoch": 1.21,
+      "grad_norm": 0.282695472240448,
+      "learning_rate": 3.962848297213623e-05,
+      "loss": 1.688,
+      "step": 390
+    },
+    {
+      "epoch": 1.24,
+      "grad_norm": 0.29477787017822266,
+      "learning_rate": 3.8080495356037155e-05,
+      "loss": 1.7292,
+      "step": 400
+    },
+    {
+      "epoch": 1.24,
+      "eval_loss": 1.7159068584442139,
+      "eval_runtime": 296.4339,
+      "eval_samples_per_second": 34.891,
+      "eval_steps_per_second": 4.362,
+      "step": 400
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 646,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 200,
+  "total_flos": 1.7617367955800064e+17,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-400/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eb9b0d97c5f97557ba963e8ca5220e952b9320dd0c477f827ed78f1709904c03
+size 4920

checkpoint-600/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: cerebras/Cerebras-GPT-2.7B
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.9.0

checkpoint-600/adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "cerebras/Cerebras-GPT-2.7B",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_attn"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-600/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:835eec50dd15f7f22c6f69f65f071a91d30c39ab5da267bac9f070a9f689c4ce
+size 10493960

checkpoint-600/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:28108ee44d3f82f7a627a5784d694ac712f951caa310b7b806d8d2188c295f63
+size 21025594

checkpoint-600/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e5afebb981d9898ead207291e05865a584b1401add6a28e65759f2b6ee9b65a
+size 14244

checkpoint-600/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:49d28e739d5ca6d44ea1a995d84261970a971c91694c31c3fc689f4a8e1aedb7
+size 1064

checkpoint-600/trainer_state.json ADDED Viewed

	@@ -0,0 +1,465 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.8561484918793503,
+  "eval_steps": 200,
+  "global_step": 600,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.5968947410583496,
+      "learning_rate": 9.896800825593395e-05,
+      "loss": 2.3935,
+      "step": 10
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 0.6443502306938171,
+      "learning_rate": 9.793601651186791e-05,
+      "loss": 2.1763,
+      "step": 20
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.4056578576564789,
+      "learning_rate": 9.690402476780186e-05,
+      "loss": 1.9601,
+      "step": 30
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 0.24697279930114746,
+      "learning_rate": 9.587203302373582e-05,
+      "loss": 1.9009,
+      "step": 40
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 0.24417832493782043,
+      "learning_rate": 9.484004127966977e-05,
+      "loss": 1.8698,
+      "step": 50
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 0.19971579313278198,
+      "learning_rate": 9.380804953560372e-05,
+      "loss": 1.8502,
+      "step": 60
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 0.21794500946998596,
+      "learning_rate": 9.277605779153768e-05,
+      "loss": 1.8139,
+      "step": 70
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 0.20081552863121033,
+      "learning_rate": 9.174406604747162e-05,
+      "loss": 1.8246,
+      "step": 80
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 0.2045777291059494,
+      "learning_rate": 9.071207430340559e-05,
+      "loss": 1.8009,
+      "step": 90
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 0.21513579785823822,
+      "learning_rate": 8.968008255933953e-05,
+      "loss": 1.7745,
+      "step": 100
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 0.2838321924209595,
+      "learning_rate": 8.864809081527348e-05,
+      "loss": 1.7831,
+      "step": 110
+    },
+    {
+      "epoch": 0.37,
+      "grad_norm": 0.19812746345996857,
+      "learning_rate": 8.761609907120744e-05,
+      "loss": 1.7554,
+      "step": 120
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 0.30143260955810547,
+      "learning_rate": 8.658410732714138e-05,
+      "loss": 1.7707,
+      "step": 130
+    },
+    {
+      "epoch": 0.43,
+      "grad_norm": 0.21341949701309204,
+      "learning_rate": 8.555211558307535e-05,
+      "loss": 1.7476,
+      "step": 140
+    },
+    {
+      "epoch": 0.46,
+      "grad_norm": 0.24006041884422302,
+      "learning_rate": 8.452012383900929e-05,
+      "loss": 1.7653,
+      "step": 150
+    },
+    {
+      "epoch": 0.49,
+      "grad_norm": 0.25095027685165405,
+      "learning_rate": 8.348813209494324e-05,
+      "loss": 1.7438,
+      "step": 160
+    },
+    {
+      "epoch": 0.53,
+      "grad_norm": 0.2602318525314331,
+      "learning_rate": 8.24561403508772e-05,
+      "loss": 1.7728,
+      "step": 170
+    },
+    {
+      "epoch": 0.56,
+      "grad_norm": 0.26738253235816956,
+      "learning_rate": 8.142414860681114e-05,
+      "loss": 1.7433,
+      "step": 180
+    },
+    {
+      "epoch": 0.59,
+      "grad_norm": 0.25230053067207336,
+      "learning_rate": 8.039215686274511e-05,
+      "loss": 1.7304,
+      "step": 190
+    },
+    {
+      "epoch": 0.62,
+      "grad_norm": 0.2360549122095108,
+      "learning_rate": 7.936016511867905e-05,
+      "loss": 1.7297,
+      "step": 200
+    },
+    {
+      "epoch": 0.62,
+      "eval_loss": 1.744746446609497,
+      "eval_runtime": 294.0669,
+      "eval_samples_per_second": 35.172,
+      "eval_steps_per_second": 4.397,
+      "step": 200
+    },
+    {
+      "epoch": 0.65,
+      "grad_norm": 0.2905316948890686,
+      "learning_rate": 6.74922600619195e-05,
+      "loss": 1.727,
+      "step": 210
+    },
+    {
+      "epoch": 0.68,
+      "grad_norm": 0.25649935007095337,
+      "learning_rate": 6.594427244582044e-05,
+      "loss": 1.7224,
+      "step": 220
+    },
+    {
+      "epoch": 0.71,
+      "grad_norm": 0.23987528681755066,
+      "learning_rate": 6.439628482972137e-05,
+      "loss": 1.7389,
+      "step": 230
+    },
+    {
+      "epoch": 0.74,
+      "grad_norm": 0.2479698807001114,
+      "learning_rate": 6.28482972136223e-05,
+      "loss": 1.7255,
+      "step": 240
+    },
+    {
+      "epoch": 0.77,
+      "grad_norm": 0.25272852182388306,
+      "learning_rate": 6.130030959752322e-05,
+      "loss": 1.7354,
+      "step": 250
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 0.2447136789560318,
+      "learning_rate": 5.9752321981424155e-05,
+      "loss": 1.7443,
+      "step": 260
+    },
+    {
+      "epoch": 0.84,
+      "grad_norm": 0.26579201221466064,
+      "learning_rate": 5.8204334365325074e-05,
+      "loss": 1.7462,
+      "step": 270
+    },
+    {
+      "epoch": 0.87,
+      "grad_norm": 0.31005680561065674,
+      "learning_rate": 5.6656346749226006e-05,
+      "loss": 1.7144,
+      "step": 280
+    },
+    {
+      "epoch": 0.9,
+      "grad_norm": 0.2663085460662842,
+      "learning_rate": 5.510835913312694e-05,
+      "loss": 1.7094,
+      "step": 290
+    },
+    {
+      "epoch": 0.93,
+      "grad_norm": 0.28601768612861633,
+      "learning_rate": 5.3560371517027864e-05,
+      "loss": 1.6838,
+      "step": 300
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 0.2900325059890747,
+      "learning_rate": 5.20123839009288e-05,
+      "loss": 1.7042,
+      "step": 310
+    },
+    {
+      "epoch": 0.99,
+      "grad_norm": 0.28358617424964905,
+      "learning_rate": 5.046439628482973e-05,
+      "loss": 1.7057,
+      "step": 320
+    },
+    {
+      "epoch": 1.02,
+      "grad_norm": 0.3409838378429413,
+      "learning_rate": 4.891640866873065e-05,
+      "loss": 1.7228,
+      "step": 330
+    },
+    {
+      "epoch": 1.05,
+      "grad_norm": 0.28400272130966187,
+      "learning_rate": 4.736842105263158e-05,
+      "loss": 1.7157,
+      "step": 340
+    },
+    {
+      "epoch": 1.08,
+      "grad_norm": 0.33671486377716064,
+      "learning_rate": 4.582043343653251e-05,
+      "loss": 1.709,
+      "step": 350
+    },
+    {
+      "epoch": 1.11,
+      "grad_norm": 0.29089200496673584,
+      "learning_rate": 4.427244582043344e-05,
+      "loss": 1.7018,
+      "step": 360
+    },
+    {
+      "epoch": 1.14,
+      "grad_norm": 0.2736106514930725,
+      "learning_rate": 4.2724458204334365e-05,
+      "loss": 1.6888,
+      "step": 370
+    },
+    {
+      "epoch": 1.18,
+      "grad_norm": 0.272792786359787,
+      "learning_rate": 4.11764705882353e-05,
+      "loss": 1.7191,
+      "step": 380
+    },
+    {
+      "epoch": 1.21,
+      "grad_norm": 0.282695472240448,
+      "learning_rate": 3.962848297213623e-05,
+      "loss": 1.688,
+      "step": 390
+    },
+    {
+      "epoch": 1.24,
+      "grad_norm": 0.29477787017822266,
+      "learning_rate": 3.8080495356037155e-05,
+      "loss": 1.7292,
+      "step": 400
+    },
+    {
+      "epoch": 1.24,
+      "eval_loss": 1.7159068584442139,
+      "eval_runtime": 296.4339,
+      "eval_samples_per_second": 34.891,
+      "eval_steps_per_second": 4.362,
+      "step": 400
+    },
+    {
+      "epoch": 1.27,
+      "grad_norm": 0.29601866006851196,
+      "learning_rate": 3.653250773993808e-05,
+      "loss": 1.7035,
+      "step": 410
+    },
+    {
+      "epoch": 1.3,
+      "grad_norm": 0.23885053396224976,
+      "learning_rate": 3.498452012383901e-05,
+      "loss": 1.6846,
+      "step": 420
+    },
+    {
+      "epoch": 1.33,
+      "grad_norm": 0.2934260666370392,
+      "learning_rate": 3.343653250773994e-05,
+      "loss": 1.6728,
+      "step": 430
+    },
+    {
+      "epoch": 1.36,
+      "grad_norm": 0.2752627730369568,
+      "learning_rate": 3.188854489164087e-05,
+      "loss": 1.7108,
+      "step": 440
+    },
+    {
+      "epoch": 1.39,
+      "grad_norm": 0.26525211334228516,
+      "learning_rate": 3.0340557275541798e-05,
+      "loss": 1.7035,
+      "step": 450
+    },
+    {
+      "epoch": 1.42,
+      "grad_norm": 0.3212621510028839,
+      "learning_rate": 2.8792569659442727e-05,
+      "loss": 1.7212,
+      "step": 460
+    },
+    {
+      "epoch": 1.45,
+      "grad_norm": 0.2639882564544678,
+      "learning_rate": 2.7244582043343652e-05,
+      "loss": 1.7058,
+      "step": 470
+    },
+    {
+      "epoch": 1.48,
+      "grad_norm": 0.3098365068435669,
+      "learning_rate": 2.5696594427244585e-05,
+      "loss": 1.6856,
+      "step": 480
+    },
+    {
+      "epoch": 1.52,
+      "grad_norm": 0.26433393359184265,
+      "learning_rate": 2.4148606811145514e-05,
+      "loss": 1.6728,
+      "step": 490
+    },
+    {
+      "epoch": 1.55,
+      "grad_norm": 0.30506235361099243,
+      "learning_rate": 2.260061919504644e-05,
+      "loss": 1.7096,
+      "step": 500
+    },
+    {
+      "epoch": 1.58,
+      "grad_norm": 0.28906792402267456,
+      "learning_rate": 2.105263157894737e-05,
+      "loss": 1.6829,
+      "step": 510
+    },
+    {
+      "epoch": 1.61,
+      "grad_norm": 0.3350558876991272,
+      "learning_rate": 1.9504643962848298e-05,
+      "loss": 1.7045,
+      "step": 520
+    },
+    {
+      "epoch": 1.64,
+      "grad_norm": 0.29045575857162476,
+      "learning_rate": 1.7956656346749227e-05,
+      "loss": 1.6696,
+      "step": 530
+    },
+    {
+      "epoch": 1.67,
+      "grad_norm": 0.2855425775051117,
+      "learning_rate": 1.6408668730650156e-05,
+      "loss": 1.6791,
+      "step": 540
+    },
+    {
+      "epoch": 1.7,
+      "grad_norm": 0.2906227111816406,
+      "learning_rate": 1.4860681114551084e-05,
+      "loss": 1.6755,
+      "step": 550
+    },
+    {
+      "epoch": 1.73,
+      "grad_norm": 0.293618768453598,
+      "learning_rate": 1.3312693498452014e-05,
+      "loss": 1.6993,
+      "step": 560
+    },
+    {
+      "epoch": 1.76,
+      "grad_norm": 0.303643137216568,
+      "learning_rate": 1.1764705882352942e-05,
+      "loss": 1.6825,
+      "step": 570
+    },
+    {
+      "epoch": 1.79,
+      "grad_norm": 0.29694506525993347,
+      "learning_rate": 1.0216718266253871e-05,
+      "loss": 1.6921,
+      "step": 580
+    },
+    {
+      "epoch": 1.83,
+      "grad_norm": 0.28675130009651184,
+      "learning_rate": 8.6687306501548e-06,
+      "loss": 1.6908,
+      "step": 590
+    },
+    {
+      "epoch": 1.86,
+      "grad_norm": 0.2960526943206787,
+      "learning_rate": 7.120743034055728e-06,
+      "loss": 1.6915,
+      "step": 600
+    },
+    {
+      "epoch": 1.86,
+      "eval_loss": 1.707315444946289,
+      "eval_runtime": 297.2939,
+      "eval_samples_per_second": 34.79,
+      "eval_steps_per_second": 4.349,
+      "step": 600
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 646,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 200,
+  "total_flos": 2.6435633007034368e+17,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-600/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eb9b0d97c5f97557ba963e8ca5220e952b9320dd0c477f827ed78f1709904c03
+size 4920

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "!",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "50256": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "!",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eb9b0d97c5f97557ba963e8ca5220e952b9320dd0c477f827ed78f1709904c03
+size 4920

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff