Edit model card

Model description

The model can be used for Swahili language generation, translation, and other NLP tasks, especially focused on the pretraining and fine-tuning domains. It has been pre-trained and fine-tuned specifically for Swahili language tasks with the Unsloth framework.

This is a development version and it's not recommended for general use.

  • Developed by: calcpy
  • License: apache-2.0
  • Finetuned from model : unsloth/llama-3.2-3b-instruct-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Out-of-Scope Use

The model is not designed for tasks outside of the Swahili language or tasks requiring highly factual precision in domains not covered by the training datasets.

Bias, Risks, and Limitations

The model inherits any potential biases present in the Swahili Wikipedia and Mollel's dataset. Users should be cautious when applying this model to sensitive applications.

Recommendations

Users should perform bias evaluations specific to their use case and ensure that any downstream applications consider potential ethical implications.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("path_to_your_model")
tokenizer = AutoTokenizer.from_pretrained("path_to_your_model")

# Example inference
instruction = "Endelea mlolongo wa fibonacci:"
input_data = "1, 1, 2, 3, 5, 8,"
prompt = f"Chini ni maagizo ambayo yanaelezea kazi. Andika jibu ambalo linakamilisha ombi ipasavyo.\n### Maagizo:\n{instruction}\n\n{input_data}\n### Jibu:\n"

inputs = tokenizer([f"{prompt}"], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

In this example, the model generates the continuation of the Fibonacci sequence in Swahili.

Training Hyperparameters

  • ** Training regime: Mixed precision (fp16/bf16)
  • ** Batch size: 2 per device
  • ** Max steps: 24,000 for pretraining, 1,200 for fine-tuning
  • ** Learning rate: 5e-5 (1e-5 for embeddings)
  • ** Warmup steps: 100 for pretraining, 10 for fine-tuning
  • ** Weight decay: 0.01 (pretraining), 0.00 (fine-tuning)

Evaluation

The model was only manually evaluated on the Alpaca Swahili dataset for instruction-following capabilities.

Metrics

Evaluation metrics will be required for language generation quality and instruction-following precision.

Summary

This is a technical release of a small test model to test pre-training and fine-tuning on a single GPU.

Compute Infrastructure

  • OS Ubuntu 22.04.5 LTS
  • Hardware Type: NVIDIA GeForce RTX 4090 24 GiB
  • Hours used: ~12 hours
Downloads last month
10
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Datasets used to train calcpy/mboni_small_test1