Training in progress, step 57

Browse files

Files changed (5) hide show

README.md +25 -89
adapter_config.json +4 -4
adapter_model.bin +1 -1
adapter_model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -1,12 +1,12 @@
 ---
-base_model: meta-llama/Meta-Llama-3-8B
 library_name: peft
-license: llama3
 tags:
 - axolotl
 - generated_from_trainer
 model-index:
-- name: llama-3-8b-ocr-correction
   results: []
 ---
@@ -18,10 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.4.1`
 ```yaml
-base_model: meta-llama/Meta-Llama-3-8B
 model_type: AutoModelForCausalLM
 tokenizer_type: AutoTokenizer
-is_mistral_derived_model: true
 load_in_8bit: false
 load_in_4bit: true
@@ -35,14 +34,14 @@ datasets:
   - path: ft_data/alpaca_data.jsonl
     type: alpaca
 dataset_prepared_path: last_run_prepared
-val_set_size: 0.1
 output_dir: ./qlora-alpaca-out
-hub_model_id: pbevan11/llama-3-8b-ocr-correction
 adapter: qlora
 lora_model_dir:
-sequence_len: 4096
 sample_packing: true
 pad_to_sequence_len: true
@@ -62,7 +61,7 @@ lora_target_modules:
 wandb_project: ocr-ft
 wandb_entity: sncds
-wandb_name: test
 gradient_accumulation_steps: 4
 micro_batch_size: 2 # was 16
@@ -104,86 +103,24 @@ special_tokens:
 </details><br>
-# llama-3-8b-ocr-correction
-This model is a qlora fine-tuned adapter for [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the [pbevan11/synthetic-ocr-correction-gpt4o](https://huggingface.co/datasets/pbevan11/synthetic-ocr-correction-gpt4o) dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.1742
-## Usage
-First, download the model
-```python
-from peft import AutoPeftModelForCausalLM
-from transformers import AutoTokenizer
-model_id='pbevan11/llama-3-8b-ocr-correction'
-model = AutoPeftModelForCausalLM.from_pretrained(model_id).cuda()
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-tokenizer.pad_token = tokenizer.eos_token
-```
-Then, construct the prompt template like so:
-```python
-def prompt(instruction, inp):
-    return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
-### Instruction:
-{instruction}
-### Input:
-{inp}
-### Response:
-"""
-def prompt_tok(instruction, inp, return_ids=False):
-    _p = prompt(instruction, inp)
-    input_ids = tokenizer(_p, return_tensors="pt", truncation=True).input_ids.cuda()
-    out_ids = model.generate(input_ids=input_ids, max_new_tokens=5000,
-                          do_sample=False)
-    ids = out_ids.detach().cpu().numpy()
-    if return_ids: return out_ids
-    full_output = tokenizer.batch_decode(ids, skip_special_tokens=True)[0]
-    response_start = full_output.find("### Response:")
-    if response_start != -1:
-        return full_output[response_start + len("### Response:"):]
-    else:
-        return full_output[len(_p):]
-```
-Finally, you can get predictions like this:
-```python
-# model inputs
-instruction = "You are an assistant that takes a piece of text that has been corrupted during OCR digitisation, and produce a corrected version of the same text."
-inp = "Do Not Kule Oi't hy.er-l'rieed AjijqIi: imac - Analyst (fteuiers) Hcuiers - A | ) | ilf, <;/) in |) nter |iic . conic! deeiilf. l.o sell n lower-|)rieofl wersinn oi its Macintosh cornutor to nttinct ronsnnu-rs already euami'red ot its iPod music jiayo-r untl annoyoil. by sccnrit.y problems ivitJi Willtlows PCs , Piper.iaffray analyst. (Jcne Muster <aid on Tlinrtiday."
-# print prediction
-out = prompt_tok(instruction, inp)
-print(out.replace('\\', ' '))
-```
-This will give you a prediction that looks like this:
-  ```md
-"Do Not Rule Out Lower-Priced Mac - Analyst (Reuters) Reuters - Apple Inc.  may be considering a lower-priced version of its Macintosh computer to attract consumers already enamored of its iPod music player and annoyed by security problems with Windows PCs, PiperJaffray analyst Gene Munster said on Thursday."
-  ```
-Alternatively, you can play with this model on Replicate: [tbc](tbc)
 ## Intended uses & limitations
-Reconstructions should not be taken as the truth, the model is likely to make some things up to fill in the gaps, and so some things may not be perfectly histoically acurate.
-This model was intended to be used to restore historical documents that have been imperfectly digitalised using OCR.
 ## Training and evaluation data
-TBC: evaluating on the test set from [pbevan11/synthetic-ocr-correction-gpt4o](https://huggingface.co/pbevan11/synthetic-ocr-correction-gpt4o)
 ## Training procedure
@@ -205,21 +142,20 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 0.6611        | 0.0165 | 1    | 0.6229          |
-| 0.3149        | 0.2469 | 15   | 0.2870          |
-| 0.2074        | 0.4938 | 30   | 0.2166          |
-| 0.2211        | 0.7407 | 45   | 0.1937          |
-| 0.195         | 0.9877 | 60   | 0.1825          |
-| 0.1411        | 1.2140 | 75   | 0.1787          |
-| 0.1348        | 1.4609 | 90   | 0.1760          |
-| 0.1479        | 1.7078 | 105  | 0.1743          |
-| 0.1413        | 1.9547 | 120  | 0.1742          |
 ### Framework versions
 - PEFT 0.11.1
-- Transformers 4.42.3
 - Pytorch 2.1.2+cu118
 - Datasets 2.19.1
 - Tokenizers 0.19.1

 ---
+base_model: meta-llama/Meta-Llama-3.1-8B
 library_name: peft
+license: llama3.1
 tags:
 - axolotl
 - generated_from_trainer
 model-index:
+- name: llama-3.1-8b-ocr-correction
   results: []
 ---
 axolotl version: `0.4.1`
 ```yaml
+base_model: meta-llama/Meta-Llama-3.1-8B
 model_type: AutoModelForCausalLM
 tokenizer_type: AutoTokenizer
 load_in_8bit: false
 load_in_4bit: true
   - path: ft_data/alpaca_data.jsonl
     type: alpaca
 dataset_prepared_path: last_run_prepared
+val_set_size: 0.05
 output_dir: ./qlora-alpaca-out
+hub_model_id: pbevan11/llama-3.1-8b-ocr-correction
 adapter: qlora
 lora_model_dir:
+sequence_len: 8192
 sample_packing: true
 pad_to_sequence_len: true
 wandb_project: ocr-ft
 wandb_entity: sncds
+wandb_name: llama31
 gradient_accumulation_steps: 4
 micro_batch_size: 2 # was 16
 </details><br>
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sncds/ocr-ft/runs/rotjhntf)
+# llama-3.1-8b-ocr-correction
+This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.1901
+## Model description
+More information needed
 ## Intended uses & limitations
+More information needed
 ## Training and evaluation data
+More information needed
 ## Training procedure
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 0.61          | 0.0331 | 1    | 0.6018          |
+| 0.4379        | 0.2645 | 8    | 0.4256          |
+| 0.2531        | 0.5289 | 16   | 0.2714          |
+| 0.2366        | 0.7934 | 24   | 0.2247          |
+| 0.1839        | 1.0331 | 32   | 0.2053          |
+| 0.1752        | 1.2975 | 40   | 0.1961          |
+| 0.1629        | 1.5620 | 48   | 0.1909          |
+| 0.163         | 1.8264 | 56   | 0.1901          |
 ### Framework versions
 - PEFT 0.11.1
+- Transformers 4.43.2
 - Pytorch 2.1.2+cu118
 - Datasets 2.19.1
 - Tokenizers 0.19.1

adapter_config.json CHANGED Viewed

@@ -20,12 +20,12 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "up_proj",
-    "down_proj",
-    "gate_proj",
-    "k_proj",
     "v_proj",
     "q_proj",
     "o_proj"
   ],
   "task_type": "CAUSAL_LM",

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "v_proj",
     "q_proj",
+    "k_proj",
+    "down_proj",
+    "up_proj",
+    "gate_proj",
     "o_proj"
   ],
   "task_type": "CAUSAL_LM",

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c64465bb2211b47808dc809512a591f6ada32a06c95e2e5ae6b3bef6b9622301
 size 167934026

 version https://git-lfs.github.com/spec/v1
+oid sha256:befe7ee91cb8ab62450880c1dabf645b053b56d4e5b4cf5a4776e29329224eeb
 size 167934026

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:01ec29cca5cd60a3f5e350beff5a81a570434ae6b0102782e0b5ac9b40ebc71c
 size 167832688

 version https://git-lfs.github.com/spec/v1
+oid sha256:094a56bdc5ddb4b0283610f269f8a14fe9b93e86c16ad75b348c378b9c7405f6
 size 167832688

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c519c403422bf69950c967ebaefca549327caa74bc80ee375ab0687e7ae81986
 size 6072

 version https://git-lfs.github.com/spec/v1
+oid sha256:823c026c21ead0a0fcfbdb2b1d26d1596e5af7ebb2cff85f40a3fbb177930914
 size 6072