calcpy commited on
Commit
45e9a9c
1 Parent(s): 7277829

Upload model trained with Unsloth

Browse files

Upload model trained with Unsloth 2x faster

Files changed (3) hide show
  1. README.md +2 -98
  2. adapter_config.json +37 -0
  3. adapter_model.safetensors +3 -0
README.md CHANGED
@@ -8,21 +8,10 @@ tags:
8
  - trl
9
  license: apache-2.0
10
  language:
11
- - sw
12
- library_name: peft
13
- datasets:
14
- - Mollel/alpaca-swahili
15
- - Mollel/swahili_pretrain_data
16
- - wikimedia/wikipedia
17
  ---
18
 
19
- # Model Detauils
20
-
21
-
22
- This model has been pre-trained and fine-tuned specifically for Swahili language tasks.
23
- The training includes 4-bit quantization to optimize performance on lower-resource hardware.
24
-
25
- This is a development version and it's not recommended for general use.
26
 
27
  - **Developed by:** calcpy
28
  - **License:** apache-2.0
@@ -31,88 +20,3 @@ This is a development version and it's not recommended for general use.
31
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
32
 
33
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
34
-
35
-
36
- ### Out-of-Scope Use
37
-
38
- The model is not designed for tasks outside of the Swahili language or tasks requiring highly factual precision in domains not covered by the training datasets.
39
-
40
-
41
- ## Bias, Risks, and Limitations
42
-
43
- The model inherits any potential biases present in the Swahili Wikipedia and Mollel's dataset. Users should be cautious when applying this model to sensitive applications.
44
-
45
-
46
- ### Recommendations
47
-
48
- Users should perform bias evaluations specific to their use case and ensure that any downstream applications consider potential ethical implications.
49
-
50
- ## How to Get Started with the Model
51
-
52
- Use the code below to get started with the model.
53
-
54
- ```python
55
- from transformers import AutoModelForCausalLM, AutoTokenizer
56
-
57
- # Load the model and tokenizer
58
- model = AutoModelForCausalLM.from_pretrained("path_to_your_model")
59
- tokenizer = AutoTokenizer.from_pretrained("path_to_your_model")
60
-
61
- # Example inference
62
- instruction = "Endelea mlolongo wa fibonacci:"
63
- input_data = "1, 1, 2, 3, 5, 8,"
64
- prompt = f"Chini ni maagizo ambayo yanaelezea kazi. Andika jibu ambalo linakamilisha ombi ipasavyo.\n### Maagizo:\n{instruction}\n\n{input_data}\n### Jibu:\n"
65
-
66
- inputs = tokenizer([f"{prompt}"], return_tensors="pt").to("cuda")
67
- outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
68
- print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
69
- ```
70
-
71
- In this example, the model generates the continuation of the Fibonacci sequence in Swahili.
72
-
73
-
74
- ## Training Details
75
-
76
- ### Training Data
77
-
78
- The model was pre-trained using a combination of [Swahili Wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia)
79
- and [Mollel’s Swahili pretraining dataset](https://huggingface.co/datasets/Mollel/swahili_pretrain_data).
80
- Both datasets were processed to include End-of-Sequence (EOS) tokens and formatted for pretraining tasks.
81
-
82
- Finetuning was performed on [Mollel's Alpaca dataset](https://huggingface.co/datasets/Mollel/alpaca-swahili)
83
-
84
- ### Training Procedure
85
-
86
- #### Training Hyperparameters
87
-
88
- - ** Training regime: Mixed precision (fp16/bf16)
89
- - ** Batch size: 2 per device
90
- - ** Max steps: 24,000 for pretraining, 1,200 for fine-tuning
91
- - ** Learning rate: 5e-5 (1e-5 for embeddings)
92
- - ** Warmup steps: 100 for pretraining, 10 for fine-tuning
93
- - ** Weight decay: 0.01 (pretraining), 0.00 (fine-tuning)
94
-
95
- ## Evaluation
96
-
97
- The model was only manually evaluated on the Alpaca Swahili dataset for instruction-following capabilities.
98
-
99
- #### Metrics
100
-
101
- Evaluation metrics will be required for language generation quality and instruction-following precision
102
-
103
- #### Summary
104
-
105
- This is a purely technical release for a small test model in order to test pre-training and fine-tuning code on a single GPU.
106
-
107
-
108
- ## Environmental Impact
109
-
110
- - **Hardware Type:** NVIDIA GeForce RTX 4090 24 GiB
111
- - **Hours used:** ~12 hours
112
-
113
-
114
- ### Compute Infrastructure
115
-
116
- Ubuntu 22.04.5 LTS with multiple NVIDIA GeForce RTX 4090 cards
117
-
118
- Only a single GPU unit was used
 
8
  - trl
9
  license: apache-2.0
10
  language:
11
+ - en
 
 
 
 
 
12
  ---
13
 
14
+ # Uploaded model
 
 
 
 
 
 
15
 
16
  - **Developed by:** calcpy
17
  - **License:** apache-2.0
 
20
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adapter_config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "unsloth/llama-3.2-3b-instruct-bnb-4bit",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 32,
14
+ "lora_dropout": 0,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": [
18
+ "lm_head",
19
+ "embed_tokens"
20
+ ],
21
+ "peft_type": "LORA",
22
+ "r": 16,
23
+ "rank_pattern": {},
24
+ "revision": null,
25
+ "target_modules": [
26
+ "down_proj",
27
+ "q_proj",
28
+ "up_proj",
29
+ "v_proj",
30
+ "gate_proj",
31
+ "k_proj",
32
+ "o_proj"
33
+ ],
34
+ "task_type": "CAUSAL_LM",
35
+ "use_dora": false,
36
+ "use_rslora": true
37
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bae76e94fd63943fefe6582f3fc247723f052f40a61c1c72a1789bdd836fd496
3
+ size 1673317496