sallywww commited on Feb 23

Commit

2a6a7a6

•

1 Parent(s): 0c027ee

llama

Browse files

Files changed (32) hide show

README.md +204 -0
adapter_config.json +27 -0
adapter_model.safetensors +3 -0
checkpoint-1000/README.md +204 -0
checkpoint-1000/adapter_config.json +27 -0
checkpoint-1000/adapter_model.bin +3 -0
checkpoint-1000/optimizer.pt +3 -0
checkpoint-1000/rng_state.pth +3 -0
checkpoint-1000/scheduler.pt +3 -0
checkpoint-1000/trainer_state.json +316 -0
checkpoint-1000/training_args.bin +3 -0
checkpoint-1500/README.md +204 -0
checkpoint-1500/adapter_config.json +27 -0
checkpoint-1500/adapter_model.bin +3 -0
checkpoint-1500/optimizer.pt +3 -0
checkpoint-1500/rng_state.pth +3 -0
checkpoint-1500/scheduler.pt +3 -0
checkpoint-1500/trainer_state.json +466 -0
checkpoint-1500/training_args.bin +3 -0
checkpoint-500/README.md +204 -0
checkpoint-500/adapter_config.json +27 -0
checkpoint-500/adapter_model.bin +3 -0
checkpoint-500/optimizer.pt +3 -0
checkpoint-500/rng_state.pth +3 -0
checkpoint-500/scheduler.pt +3 -0
checkpoint-500/trainer_state.json +166 -0
checkpoint-500/training_args.bin +3 -0
runs/Feb18_09-02-46_n6p6x80stw/events.out.tfevents.1708246969.n6p6x80stw.255.0 +3 -0
runs/Feb18_09-58-43_n6p6x80stw/events.out.tfevents.1708250324.n6p6x80stw.483.0 +3 -0
runs/Feb18_10-01-41_n6p6x80stw/events.out.tfevents.1708250502.n6p6x80stw.616.0 +3 -0
runs/Feb19_00-06-34_n6p6x80stw/events.out.tfevents.1708301195.n6p6x80stw.1619.0 +3 -0
runs/Feb19_00-20-12_n6p6x80stw/events.out.tfevents.1708302014.n6p6x80stw.1845.0 +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: baffo32/decapoda-research-llama-7B-hf
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.8.2

adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "baffo32/decapoda-research-llama-7B-hf",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:01b0612303363ecac288f42ce8b30c4ca68f4159ea1536295861f5a3ecc487a3
+size 16794200

checkpoint-1000/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: baffo32/decapoda-research-llama-7B-hf
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.8.2

checkpoint-1000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "baffo32/decapoda-research-llama-7B-hf",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-1000/adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:237e0f664143c974a292ac389f6f70316eb4136eb2e4539dc8edf2249abfda06
+size 16823434

checkpoint-1000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:13d1062a88778c89a93dd8b88a7040dba1bd161ae4fa2c15dfc18251f89fccb2
+size 33630330

checkpoint-1000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1c8abec63f0410d7e59a8f94e42cb6b45f0af57f45edec4b95b36ca0984febc5
+size 14244

checkpoint-1000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fba2e914c810419e4a798c364c902032bd211d89d65e959ae170a42f7b3f0ea3
+size 1064

checkpoint-1000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,316 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 16.0,
+  "global_step": 1000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.32,
+      "learning_rate": 1.978494623655914e-05,
+      "loss": 2.3806,
+      "step": 20
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 1.956989247311828e-05,
+      "loss": 2.2445,
+      "step": 40
+    },
+    {
+      "epoch": 0.96,
+      "learning_rate": 1.935483870967742e-05,
+      "loss": 1.9394,
+      "step": 60
+    },
+    {
+      "epoch": 1.28,
+      "learning_rate": 1.9150537634408603e-05,
+      "loss": 1.7953,
+      "step": 80
+    },
+    {
+      "epoch": 1.6,
+      "learning_rate": 1.8935483870967742e-05,
+      "loss": 1.6449,
+      "step": 100
+    },
+    {
+      "epoch": 1.92,
+      "learning_rate": 1.8720430107526882e-05,
+      "loss": 1.4618,
+      "step": 120
+    },
+    {
+      "epoch": 2.24,
+      "learning_rate": 1.8505376344086025e-05,
+      "loss": 1.2538,
+      "step": 140
+    },
+    {
+      "epoch": 2.56,
+      "learning_rate": 1.8290322580645165e-05,
+      "loss": 1.0844,
+      "step": 160
+    },
+    {
+      "epoch": 2.88,
+      "learning_rate": 1.8075268817204305e-05,
+      "loss": 0.9909,
+      "step": 180
+    },
+    {
+      "epoch": 3.2,
+      "learning_rate": 1.7860215053763444e-05,
+      "loss": 0.9419,
+      "step": 200
+    },
+    {
+      "epoch": 3.52,
+      "learning_rate": 1.764516129032258e-05,
+      "loss": 0.8202,
+      "step": 220
+    },
+    {
+      "epoch": 3.84,
+      "learning_rate": 1.743010752688172e-05,
+      "loss": 0.7706,
+      "step": 240
+    },
+    {
+      "epoch": 4.16,
+      "learning_rate": 1.721505376344086e-05,
+      "loss": 0.7102,
+      "step": 260
+    },
+    {
+      "epoch": 4.48,
+      "learning_rate": 1.7e-05,
+      "loss": 0.6723,
+      "step": 280
+    },
+    {
+      "epoch": 4.8,
+      "learning_rate": 1.678494623655914e-05,
+      "loss": 0.6262,
+      "step": 300
+    },
+    {
+      "epoch": 5.12,
+      "learning_rate": 1.656989247311828e-05,
+      "loss": 0.5895,
+      "step": 320
+    },
+    {
+      "epoch": 5.44,
+      "learning_rate": 1.6354838709677422e-05,
+      "loss": 0.5636,
+      "step": 340
+    },
+    {
+      "epoch": 5.76,
+      "learning_rate": 1.6139784946236562e-05,
+      "loss": 0.5095,
+      "step": 360
+    },
+    {
+      "epoch": 6.08,
+      "learning_rate": 1.5924731182795702e-05,
+      "loss": 0.5093,
+      "step": 380
+    },
+    {
+      "epoch": 6.4,
+      "learning_rate": 1.570967741935484e-05,
+      "loss": 0.4729,
+      "step": 400
+    },
+    {
+      "epoch": 6.72,
+      "learning_rate": 1.549462365591398e-05,
+      "loss": 0.4805,
+      "step": 420
+    },
+    {
+      "epoch": 7.04,
+      "learning_rate": 1.527956989247312e-05,
+      "loss": 0.4508,
+      "step": 440
+    },
+    {
+      "epoch": 7.36,
+      "learning_rate": 1.5064516129032259e-05,
+      "loss": 0.4254,
+      "step": 460
+    },
+    {
+      "epoch": 7.68,
+      "learning_rate": 1.4849462365591399e-05,
+      "loss": 0.4104,
+      "step": 480
+    },
+    {
+      "epoch": 8.0,
+      "learning_rate": 1.4634408602150539e-05,
+      "loss": 0.4135,
+      "step": 500
+    },
+    {
+      "epoch": 8.32,
+      "learning_rate": 1.4419354838709678e-05,
+      "loss": 0.3828,
+      "step": 520
+    },
+    {
+      "epoch": 8.64,
+      "learning_rate": 1.4204301075268818e-05,
+      "loss": 0.3924,
+      "step": 540
+    },
+    {
+      "epoch": 8.96,
+      "learning_rate": 1.3989247311827958e-05,
+      "loss": 0.3782,
+      "step": 560
+    },
+    {
+      "epoch": 9.28,
+      "learning_rate": 1.3774193548387098e-05,
+      "loss": 0.3569,
+      "step": 580
+    },
+    {
+      "epoch": 9.6,
+      "learning_rate": 1.3559139784946237e-05,
+      "loss": 0.3609,
+      "step": 600
+    },
+    {
+      "epoch": 9.92,
+      "learning_rate": 1.3344086021505379e-05,
+      "loss": 0.3385,
+      "step": 620
+    },
+    {
+      "epoch": 10.24,
+      "learning_rate": 1.3129032258064518e-05,
+      "loss": 0.3246,
+      "step": 640
+    },
+    {
+      "epoch": 10.56,
+      "learning_rate": 1.2913978494623658e-05,
+      "loss": 0.3229,
+      "step": 660
+    },
+    {
+      "epoch": 10.88,
+      "learning_rate": 1.2698924731182796e-05,
+      "loss": 0.3051,
+      "step": 680
+    },
+    {
+      "epoch": 11.2,
+      "learning_rate": 1.2483870967741936e-05,
+      "loss": 0.3159,
+      "step": 700
+    },
+    {
+      "epoch": 11.52,
+      "learning_rate": 1.2268817204301076e-05,
+      "loss": 0.2884,
+      "step": 720
+    },
+    {
+      "epoch": 11.84,
+      "learning_rate": 1.2053763440860215e-05,
+      "loss": 0.2976,
+      "step": 740
+    },
+    {
+      "epoch": 12.16,
+      "learning_rate": 1.1838709677419355e-05,
+      "loss": 0.2934,
+      "step": 760
+    },
+    {
+      "epoch": 12.48,
+      "learning_rate": 1.1623655913978495e-05,
+      "loss": 0.2739,
+      "step": 780
+    },
+    {
+      "epoch": 12.8,
+      "learning_rate": 1.1408602150537636e-05,
+      "loss": 0.2807,
+      "step": 800
+    },
+    {
+      "epoch": 13.12,
+      "learning_rate": 1.1193548387096776e-05,
+      "loss": 0.2593,
+      "step": 820
+    },
+    {
+      "epoch": 13.44,
+      "learning_rate": 1.0978494623655916e-05,
+      "loss": 0.2589,
+      "step": 840
+    },
+    {
+      "epoch": 13.76,
+      "learning_rate": 1.0763440860215055e-05,
+      "loss": 0.2679,
+      "step": 860
+    },
+    {
+      "epoch": 14.08,
+      "learning_rate": 1.0548387096774195e-05,
+      "loss": 0.2463,
+      "step": 880
+    },
+    {
+      "epoch": 14.4,
+      "learning_rate": 1.0333333333333335e-05,
+      "loss": 0.2435,
+      "step": 900
+    },
+    {
+      "epoch": 14.72,
+      "learning_rate": 1.0118279569892473e-05,
+      "loss": 0.2469,
+      "step": 920
+    },
+    {
+      "epoch": 15.04,
+      "learning_rate": 9.903225806451614e-06,
+      "loss": 0.2406,
+      "step": 940
+    },
+    {
+      "epoch": 15.36,
+      "learning_rate": 9.688172043010754e-06,
+      "loss": 0.2231,
+      "step": 960
+    },
+    {
+      "epoch": 15.68,
+      "learning_rate": 9.473118279569892e-06,
+      "loss": 0.2317,
+      "step": 980
+    },
+    {
+      "epoch": 16.0,
+      "learning_rate": 9.258064516129034e-06,
+      "loss": 0.2305,
+      "step": 1000
+    }
+  ],
+  "max_steps": 1860,
+  "num_train_epochs": 30,
+  "total_flos": 5.1982555742208e+18,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:13cb1e7e170180c3d0a314ab74f1969a16dc260ec9bc66380facf0d7a2dcb84e
+size 4408

checkpoint-1500/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: baffo32/decapoda-research-llama-7B-hf
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.8.2

checkpoint-1500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "baffo32/decapoda-research-llama-7B-hf",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-1500/adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:144175c0c00e688a80b135e7efd4efafe04e72d233be3063a1a34f106d46a367
+size 16823434

checkpoint-1500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f53b868b4a5be27c8c61a13253e5cea4ba6394b0adca68bf007e0f9f5eb8ba89
+size 33630330

checkpoint-1500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c0d2740f3802eb4255ad2d5509d187ea8da94cce4d379b58136676dc4708a9c3
+size 14244

checkpoint-1500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1808e326cf773e3c5f5bcce2054a5a49f1d7a9e4e3714a44189c3558163bccf7
+size 1064

checkpoint-1500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,466 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 24.0,
+  "global_step": 1500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.32,
+      "learning_rate": 1.978494623655914e-05,
+      "loss": 2.3806,
+      "step": 20
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 1.956989247311828e-05,
+      "loss": 2.2445,
+      "step": 40
+    },
+    {
+      "epoch": 0.96,
+      "learning_rate": 1.935483870967742e-05,
+      "loss": 1.9394,
+      "step": 60
+    },
+    {
+      "epoch": 1.28,
+      "learning_rate": 1.9150537634408603e-05,
+      "loss": 1.7953,
+      "step": 80
+    },
+    {
+      "epoch": 1.6,
+      "learning_rate": 1.8935483870967742e-05,
+      "loss": 1.6449,
+      "step": 100
+    },
+    {
+      "epoch": 1.92,
+      "learning_rate": 1.8720430107526882e-05,
+      "loss": 1.4618,
+      "step": 120
+    },
+    {
+      "epoch": 2.24,
+      "learning_rate": 1.8505376344086025e-05,
+      "loss": 1.2538,
+      "step": 140
+    },
+    {
+      "epoch": 2.56,
+      "learning_rate": 1.8290322580645165e-05,
+      "loss": 1.0844,
+      "step": 160
+    },
+    {
+      "epoch": 2.88,
+      "learning_rate": 1.8075268817204305e-05,
+      "loss": 0.9909,
+      "step": 180
+    },
+    {
+      "epoch": 3.2,
+      "learning_rate": 1.7860215053763444e-05,
+      "loss": 0.9419,
+      "step": 200
+    },
+    {
+      "epoch": 3.52,
+      "learning_rate": 1.764516129032258e-05,
+      "loss": 0.8202,
+      "step": 220
+    },
+    {
+      "epoch": 3.84,
+      "learning_rate": 1.743010752688172e-05,
+      "loss": 0.7706,
+      "step": 240
+    },
+    {
+      "epoch": 4.16,
+      "learning_rate": 1.721505376344086e-05,
+      "loss": 0.7102,
+      "step": 260
+    },
+    {
+      "epoch": 4.48,
+      "learning_rate": 1.7e-05,
+      "loss": 0.6723,
+      "step": 280
+    },
+    {
+      "epoch": 4.8,
+      "learning_rate": 1.678494623655914e-05,
+      "loss": 0.6262,
+      "step": 300
+    },
+    {
+      "epoch": 5.12,
+      "learning_rate": 1.656989247311828e-05,
+      "loss": 0.5895,
+      "step": 320
+    },
+    {
+      "epoch": 5.44,
+      "learning_rate": 1.6354838709677422e-05,
+      "loss": 0.5636,
+      "step": 340
+    },
+    {
+      "epoch": 5.76,
+      "learning_rate": 1.6139784946236562e-05,
+      "loss": 0.5095,
+      "step": 360
+    },
+    {
+      "epoch": 6.08,
+      "learning_rate": 1.5924731182795702e-05,
+      "loss": 0.5093,
+      "step": 380
+    },
+    {
+      "epoch": 6.4,
+      "learning_rate": 1.570967741935484e-05,
+      "loss": 0.4729,
+      "step": 400
+    },
+    {
+      "epoch": 6.72,
+      "learning_rate": 1.549462365591398e-05,
+      "loss": 0.4805,
+      "step": 420
+    },
+    {
+      "epoch": 7.04,
+      "learning_rate": 1.527956989247312e-05,
+      "loss": 0.4508,
+      "step": 440
+    },
+    {
+      "epoch": 7.36,
+      "learning_rate": 1.5064516129032259e-05,
+      "loss": 0.4254,
+      "step": 460
+    },
+    {
+      "epoch": 7.68,
+      "learning_rate": 1.4849462365591399e-05,
+      "loss": 0.4104,
+      "step": 480
+    },
+    {
+      "epoch": 8.0,
+      "learning_rate": 1.4634408602150539e-05,
+      "loss": 0.4135,
+      "step": 500
+    },
+    {
+      "epoch": 8.32,
+      "learning_rate": 1.4419354838709678e-05,
+      "loss": 0.3828,
+      "step": 520
+    },
+    {
+      "epoch": 8.64,
+      "learning_rate": 1.4204301075268818e-05,
+      "loss": 0.3924,
+      "step": 540
+    },
+    {
+      "epoch": 8.96,
+      "learning_rate": 1.3989247311827958e-05,
+      "loss": 0.3782,
+      "step": 560
+    },
+    {
+      "epoch": 9.28,
+      "learning_rate": 1.3774193548387098e-05,
+      "loss": 0.3569,
+      "step": 580
+    },
+    {
+      "epoch": 9.6,
+      "learning_rate": 1.3559139784946237e-05,
+      "loss": 0.3609,
+      "step": 600
+    },
+    {
+      "epoch": 9.92,
+      "learning_rate": 1.3344086021505379e-05,
+      "loss": 0.3385,
+      "step": 620
+    },
+    {
+      "epoch": 10.24,
+      "learning_rate": 1.3129032258064518e-05,
+      "loss": 0.3246,
+      "step": 640
+    },
+    {
+      "epoch": 10.56,
+      "learning_rate": 1.2913978494623658e-05,
+      "loss": 0.3229,
+      "step": 660
+    },
+    {
+      "epoch": 10.88,
+      "learning_rate": 1.2698924731182796e-05,
+      "loss": 0.3051,
+      "step": 680
+    },
+    {
+      "epoch": 11.2,
+      "learning_rate": 1.2483870967741936e-05,
+      "loss": 0.3159,
+      "step": 700
+    },
+    {
+      "epoch": 11.52,
+      "learning_rate": 1.2268817204301076e-05,
+      "loss": 0.2884,
+      "step": 720
+    },
+    {
+      "epoch": 11.84,
+      "learning_rate": 1.2053763440860215e-05,
+      "loss": 0.2976,
+      "step": 740
+    },
+    {
+      "epoch": 12.16,
+      "learning_rate": 1.1838709677419355e-05,
+      "loss": 0.2934,
+      "step": 760
+    },
+    {
+      "epoch": 12.48,
+      "learning_rate": 1.1623655913978495e-05,
+      "loss": 0.2739,
+      "step": 780
+    },
+    {
+      "epoch": 12.8,
+      "learning_rate": 1.1408602150537636e-05,
+      "loss": 0.2807,
+      "step": 800
+    },
+    {
+      "epoch": 13.12,
+      "learning_rate": 1.1193548387096776e-05,
+      "loss": 0.2593,
+      "step": 820
+    },
+    {
+      "epoch": 13.44,
+      "learning_rate": 1.0978494623655916e-05,
+      "loss": 0.2589,
+      "step": 840
+    },
+    {
+      "epoch": 13.76,
+      "learning_rate": 1.0763440860215055e-05,
+      "loss": 0.2679,
+      "step": 860
+    },
+    {
+      "epoch": 14.08,
+      "learning_rate": 1.0548387096774195e-05,
+      "loss": 0.2463,
+      "step": 880
+    },
+    {
+      "epoch": 14.4,
+      "learning_rate": 1.0333333333333335e-05,
+      "loss": 0.2435,
+      "step": 900
+    },
+    {
+      "epoch": 14.72,
+      "learning_rate": 1.0118279569892473e-05,
+      "loss": 0.2469,
+      "step": 920
+    },
+    {
+      "epoch": 15.04,
+      "learning_rate": 9.903225806451614e-06,
+      "loss": 0.2406,
+      "step": 940
+    },
+    {
+      "epoch": 15.36,
+      "learning_rate": 9.688172043010754e-06,
+      "loss": 0.2231,
+      "step": 960
+    },
+    {
+      "epoch": 15.68,
+      "learning_rate": 9.473118279569892e-06,
+      "loss": 0.2317,
+      "step": 980
+    },
+    {
+      "epoch": 16.0,
+      "learning_rate": 9.258064516129034e-06,
+      "loss": 0.2305,
+      "step": 1000
+    },
+    {
+      "epoch": 16.32,
+      "learning_rate": 9.043010752688173e-06,
+      "loss": 0.2227,
+      "step": 1020
+    },
+    {
+      "epoch": 16.64,
+      "learning_rate": 8.827956989247313e-06,
+      "loss": 0.2253,
+      "step": 1040
+    },
+    {
+      "epoch": 16.96,
+      "learning_rate": 8.612903225806453e-06,
+      "loss": 0.2174,
+      "step": 1060
+    },
+    {
+      "epoch": 17.28,
+      "learning_rate": 8.397849462365592e-06,
+      "loss": 0.2054,
+      "step": 1080
+    },
+    {
+      "epoch": 17.6,
+      "learning_rate": 8.182795698924732e-06,
+      "loss": 0.2171,
+      "step": 1100
+    },
+    {
+      "epoch": 17.92,
+      "learning_rate": 7.967741935483872e-06,
+      "loss": 0.2104,
+      "step": 1120
+    },
+    {
+      "epoch": 18.24,
+      "learning_rate": 7.752688172043012e-06,
+      "loss": 0.2194,
+      "step": 1140
+    },
+    {
+      "epoch": 18.56,
+      "learning_rate": 7.537634408602151e-06,
+      "loss": 0.1948,
+      "step": 1160
+    },
+    {
+      "epoch": 18.88,
+      "learning_rate": 7.322580645161291e-06,
+      "loss": 0.2013,
+      "step": 1180
+    },
+    {
+      "epoch": 19.2,
+      "learning_rate": 7.10752688172043e-06,
+      "loss": 0.2051,
+      "step": 1200
+    },
+    {
+      "epoch": 19.52,
+      "learning_rate": 6.89247311827957e-06,
+      "loss": 0.1946,
+      "step": 1220
+    },
+    {
+      "epoch": 19.84,
+      "learning_rate": 6.67741935483871e-06,
+      "loss": 0.1979,
+      "step": 1240
+    },
+    {
+      "epoch": 20.16,
+      "learning_rate": 6.46236559139785e-06,
+      "loss": 0.1954,
+      "step": 1260
+    },
+    {
+      "epoch": 20.48,
+      "learning_rate": 6.24731182795699e-06,
+      "loss": 0.1846,
+      "step": 1280
+    },
+    {
+      "epoch": 20.8,
+      "learning_rate": 6.0322580645161295e-06,
+      "loss": 0.194,
+      "step": 1300
+    },
+    {
+      "epoch": 21.12,
+      "learning_rate": 5.817204301075268e-06,
+      "loss": 0.19,
+      "step": 1320
+    },
+    {
+      "epoch": 21.44,
+      "learning_rate": 5.602150537634409e-06,
+      "loss": 0.1869,
+      "step": 1340
+    },
+    {
+      "epoch": 21.76,
+      "learning_rate": 5.387096774193549e-06,
+      "loss": 0.1948,
+      "step": 1360
+    },
+    {
+      "epoch": 22.08,
+      "learning_rate": 5.1720430107526885e-06,
+      "loss": 0.1773,
+      "step": 1380
+    },
+    {
+      "epoch": 22.4,
+      "learning_rate": 4.956989247311829e-06,
+      "loss": 0.1742,
+      "step": 1400
+    },
+    {
+      "epoch": 22.72,
+      "learning_rate": 4.741935483870968e-06,
+      "loss": 0.1834,
+      "step": 1420
+    },
+    {
+      "epoch": 23.04,
+      "learning_rate": 4.526881720430108e-06,
+      "loss": 0.1927,
+      "step": 1440
+    },
+    {
+      "epoch": 23.36,
+      "learning_rate": 4.311827956989247e-06,
+      "loss": 0.1817,
+      "step": 1460
+    },
+    {
+      "epoch": 23.68,
+      "learning_rate": 4.096774193548387e-06,
+      "loss": 0.1723,
+      "step": 1480
+    },
+    {
+      "epoch": 24.0,
+      "learning_rate": 3.881720430107528e-06,
+      "loss": 0.1869,
+      "step": 1500
+    }
+  ],
+  "max_steps": 1860,
+  "num_train_epochs": 30,
+  "total_flos": 7.7973833613312e+18,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:13cb1e7e170180c3d0a314ab74f1969a16dc260ec9bc66380facf0d7a2dcb84e
+size 4408

checkpoint-500/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: baffo32/decapoda-research-llama-7B-hf
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.8.2

checkpoint-500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "baffo32/decapoda-research-llama-7B-hf",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-500/adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:12b3a5fd9ae8b5260f6d61aaa50cebc37d4d58e526cbe8c9aae642fbd34afab5
+size 16823434

checkpoint-500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fedebc1ecd2e11bf4031d563d1bf81d4dc41afacc5839d237c12a1972156dd87
+size 33630330

checkpoint-500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:451b4440b0fa834a930b318cc44ecb5cae111ecc6f340a6b74b7583d2e84cb85
+size 14244

checkpoint-500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:53e03d1928bcfc31ad26ae28eacba5e38a94ef245814c8f427299825182f711f
+size 1064

checkpoint-500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,166 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 8.0,
+  "global_step": 500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.32,
+      "learning_rate": 1.978494623655914e-05,
+      "loss": 2.3806,
+      "step": 20
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 1.956989247311828e-05,
+      "loss": 2.2445,
+      "step": 40
+    },
+    {
+      "epoch": 0.96,
+      "learning_rate": 1.935483870967742e-05,
+      "loss": 1.9394,
+      "step": 60
+    },
+    {
+      "epoch": 1.28,
+      "learning_rate": 1.9150537634408603e-05,
+      "loss": 1.7953,
+      "step": 80
+    },
+    {
+      "epoch": 1.6,
+      "learning_rate": 1.8935483870967742e-05,
+      "loss": 1.6449,
+      "step": 100
+    },
+    {
+      "epoch": 1.92,
+      "learning_rate": 1.8720430107526882e-05,
+      "loss": 1.4618,
+      "step": 120
+    },
+    {
+      "epoch": 2.24,
+      "learning_rate": 1.8505376344086025e-05,
+      "loss": 1.2538,
+      "step": 140
+    },
+    {
+      "epoch": 2.56,
+      "learning_rate": 1.8290322580645165e-05,
+      "loss": 1.0844,
+      "step": 160
+    },
+    {
+      "epoch": 2.88,
+      "learning_rate": 1.8075268817204305e-05,
+      "loss": 0.9909,
+      "step": 180
+    },
+    {
+      "epoch": 3.2,
+      "learning_rate": 1.7860215053763444e-05,
+      "loss": 0.9419,
+      "step": 200
+    },
+    {
+      "epoch": 3.52,
+      "learning_rate": 1.764516129032258e-05,
+      "loss": 0.8202,
+      "step": 220
+    },
+    {
+      "epoch": 3.84,
+      "learning_rate": 1.743010752688172e-05,
+      "loss": 0.7706,
+      "step": 240
+    },
+    {
+      "epoch": 4.16,
+      "learning_rate": 1.721505376344086e-05,
+      "loss": 0.7102,
+      "step": 260
+    },
+    {
+      "epoch": 4.48,
+      "learning_rate": 1.7e-05,
+      "loss": 0.6723,
+      "step": 280
+    },
+    {
+      "epoch": 4.8,
+      "learning_rate": 1.678494623655914e-05,
+      "loss": 0.6262,
+      "step": 300
+    },
+    {
+      "epoch": 5.12,
+      "learning_rate": 1.656989247311828e-05,
+      "loss": 0.5895,
+      "step": 320
+    },
+    {
+      "epoch": 5.44,
+      "learning_rate": 1.6354838709677422e-05,
+      "loss": 0.5636,
+      "step": 340
+    },
+    {
+      "epoch": 5.76,
+      "learning_rate": 1.6139784946236562e-05,
+      "loss": 0.5095,
+      "step": 360
+    },
+    {
+      "epoch": 6.08,
+      "learning_rate": 1.5924731182795702e-05,
+      "loss": 0.5093,
+      "step": 380
+    },
+    {
+      "epoch": 6.4,
+      "learning_rate": 1.570967741935484e-05,
+      "loss": 0.4729,
+      "step": 400
+    },
+    {
+      "epoch": 6.72,
+      "learning_rate": 1.549462365591398e-05,
+      "loss": 0.4805,
+      "step": 420
+    },
+    {
+      "epoch": 7.04,
+      "learning_rate": 1.527956989247312e-05,
+      "loss": 0.4508,
+      "step": 440
+    },
+    {
+      "epoch": 7.36,
+      "learning_rate": 1.5064516129032259e-05,
+      "loss": 0.4254,
+      "step": 460
+    },
+    {
+      "epoch": 7.68,
+      "learning_rate": 1.4849462365591399e-05,
+      "loss": 0.4104,
+      "step": 480
+    },
+    {
+      "epoch": 8.0,
+      "learning_rate": 1.4634408602150539e-05,
+      "loss": 0.4135,
+      "step": 500
+    }
+  ],
+  "max_steps": 1860,
+  "num_train_epochs": 30,
+  "total_flos": 2.5991277871104e+18,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:13cb1e7e170180c3d0a314ab74f1969a16dc260ec9bc66380facf0d7a2dcb84e
+size 4408

runs/Feb18_09-02-46_n6p6x80stw/events.out.tfevents.1708246969.n6p6x80stw.255.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6359171b576a1a1365d09f6397f596bdc359269c2a5efb034cad1a059e29844c
+size 5065

runs/Feb18_09-58-43_n6p6x80stw/events.out.tfevents.1708250324.n6p6x80stw.483.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f0125753cbd29257b1801bcb9c55064ddda12a690b94dfb3b51189f3ec3d06e8
+size 4263

runs/Feb18_10-01-41_n6p6x80stw/events.out.tfevents.1708250502.n6p6x80stw.616.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3aabd6a09c1c7e88280997ac45adf30a2852a7269897bd779c3690ab9639058f
+size 4263

runs/Feb19_00-06-34_n6p6x80stw/events.out.tfevents.1708301195.n6p6x80stw.1619.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bb0a084cfe58e657ae6788e2ab9d97c7c2a3043f51709d26837f45bea6034c80
+size 4263

runs/Feb19_00-20-12_n6p6x80stw/events.out.tfevents.1708302014.n6p6x80stw.1845.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6f3f19af512cdcf03311f77ef83580ab8eb3df0e654e1b4033e3f3355e8b74d9
+size 19200