--- base_model: unsloth/llama-3.2-3b-instruct-bnb-4bit tags: - text-generation-inference - transformers - unsloth - llama - trl license: apache-2.0 language: - en - sw datasets: - wikimedia/wikipedia - Mollel/alpaca-swahili - Mollel/swahili_pretrain_data library_name: peft --- # Model description The model can be used for Swahili language generation, translation, and other NLP tasks, especially focused on the pretraining and fine-tuning domains. It has been pre-trained and fine-tuned specifically for Swahili language tasks with the Unsloth framework. This is a development version and it's not recommended for general use. - **Developed by:** calcpy - **License:** apache-2.0 - **Finetuned from model :** unsloth/llama-3.2-3b-instruct-bnb-4bit This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth) ## Out-of-Scope Use The model is not designed for tasks outside of the Swahili language or tasks requiring highly factual precision in domains not covered by the training datasets. ## Bias, Risks, and Limitations The model inherits any potential biases present in the Swahili Wikipedia and Mollel's dataset. Users should be cautious when applying this model to sensitive applications. ### Recommendations Users should perform bias evaluations specific to their use case and ensure that any downstream applications consider potential ethical implications. ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load the model and tokenizer model = AutoModelForCausalLM.from_pretrained("path_to_your_model") tokenizer = AutoTokenizer.from_pretrained("path_to_your_model") # Example inference instruction = "Endelea mlolongo wa fibonacci:" input_data = "1, 1, 2, 3, 5, 8," prompt = f"Chini ni maagizo ambayo yanaelezea kazi. Andika jibu ambalo linakamilisha ombi ipasavyo.\n### Maagizo:\n{instruction}\n\n{input_data}\n### Jibu:\n" inputs = tokenizer([f"{prompt}"], return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True) print(tokenizer.batch_decode(outputs, skip_special_tokens=True)) ``` In this example, the model generates the continuation of the Fibonacci sequence in Swahili. ## Training Hyperparameters - ** Training regime: Mixed precision (fp16/bf16) - ** Batch size: 2 per device - ** Max steps: 24,000 for pretraining, 1,200 for fine-tuning - ** Learning rate: 5e-5 (1e-5 for embeddings) - ** Warmup steps: 100 for pretraining, 10 for fine-tuning - ** Weight decay: 0.01 (pretraining), 0.00 (fine-tuning) ## Evaluation The model was only manually evaluated on the Alpaca Swahili dataset for instruction-following capabilities. ## Metrics Evaluation metrics will be required for language generation quality and instruction-following precision. ## Summary This is a technical release of a small test model to test pre-training and fine-tuning on a single GPU. ## Compute Infrastructure - **OS** Ubuntu 22.04.5 LTS - **Hardware Type:** NVIDIA GeForce RTX 4090 24 GiB - **Hours used:** ~12 hours