ubaada/original-transformer

This is a custom huggingface model port of the PyTorch implementation of the original transformer model from 2017 introduced in the paper "Attention Is All You Need". This is the 65M parameter base model version trained to do English-to-German translations.

Usage:

model = AutoModel.from_pretrained("ubaada/original-transformer", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ubaada/original-transformer")
text = 'This is my cat'
output = model.generate(**tokenizer(text, return_tensors="pt", add_special_tokens=True, truncation=True, max_length=100))
tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
# Output: ' Das ist meine Katze.'

(remember the trust_remote_code=True because of custom modeling file)

Training:

Parameter	Value
Dataset	WMT14-de-en
Translation Pairs	4.5M (135M tokens total)
Epochs	24
Batch Size	16
Accumulation Batch	8
Effective Batch Size	128 (16 * 8)
Training Script	train.py
Optimiser	Adam (learning rate = 0.0001)
Loss Type	Cross Entropy
Final Test Loss	1.87
GPU.	RTX 4070 (12GB)

ubaada
/

original-transformer

Usage:

Training:

Results

Dataset used to train ubaada/original-transformer