--- license: mit --- # Transformer Model for Language Translation ## Overview This project implements a Transformer model for language translation between English and Italian. Built from scratch, it aims to provide a deeper understanding of the Transformer architecture, which has become a cornerstone in natural language processing tasks. The project explores key elements of the architecture, such as the attention mechanism, and demonstrates hands-on experience with data preprocessing, model training, and evaluation. ## Learning Objectives - Understand and implement the Transformer model architecture. - Explore the attention mechanism and its application in language translation. - Gain practical experience with data preprocessing, model training, and evaluation in NLP. ## Model Card on Hugging Face You can find and use the pre-trained model on Hugging Face here: [Model on Hugging Face](https://huggingface.co/amc-madalin/amc-en-it/tree/main) ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("your-huggingface-model-url") model = AutoModelForSeq2SeqLM.from_pretrained("your-huggingface-model-url") # Translation Example text = "Hello, how are you?" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs) translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(translated_text) ``` ## Project Structure - **Attention Visualization** (`attention_visual.ipynb`): A notebook for visualizing attention maps to understand how the model focuses on different sentence parts during translation. - **Configuration Settings** (`config.py`): Includes hyperparameters and other modifiable settings. - **Dataset Processing** (`dataset.py`): Handles loading and preprocessing of English and Italian datasets. - **Model Architecture** (`model.py`): Defines the Transformer model architecture. - **Project Documentation** (`README.md`): This file, which provides a complete overview of the project. - **Experiment Logs** (`runs/`): Logs and outputs from model training sessions. - **Tokenizers** (`tokenizer_en.json`, `tokenizer_it.json`): Tokenizers for English and Italian text preprocessing. - **Training Script** (`train.py`): The script that encapsulates the training process. - **Saved Model Weights** (`weights/`): Stores the trained model weights for future use. ## Installation To set up and run the project locally, follow these steps: 1. **Clone the Repository:** ```bash git clone https://github.com/amc-madalin/transformer-for-language-translation.git ``` 2. **Create a Python Environment:** Create a Conda environment: ```bash conda create --name transformer python=3.x ``` Replace `3.x` with your preferred Python version. 3. **Activate the Environment:** ```bash conda activate transformer ``` 4. **Install Dependencies:** Install required packages from `requirements.txt`: ```bash pip install -r requirements.txt ``` 5. **Prepare Data:** The dataset will be automatically downloaded. Modify the source (`lang_src`) and target (`lang_tgt`) languages in `config.py`, if necessary. The default is set to English (`en`) and Italian (`it`): ```json "lang_src": "en", "lang_tgt": "it", ``` 6. **Train the Model:** Start the training process with: ```bash python train.py ``` 7. **Use the Model:** The trained model weights will be saved in the `weights/` directory. Use these weights for inference, evaluation, or further applications. ## Using the Model with Hugging Face Once trained, the model can be uploaded to Hugging Face for easy access and use. ### Uploading the Model to Hugging Face Use the following steps to upload your trained model to Hugging Face: ```bash huggingface-cli login transformers-cli upload ./weights/ --organization your-organization ``` ### Loading the Model from Hugging Face for Inference You can easily load the model for translation tasks directly from Hugging Face: ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("your-huggingface-model-url") model = AutoModelForSeq2SeqLM.from_pretrained("your-huggingface-model-url") # Translate text text = "How are you?" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs) translation = tokenizer.decode(outputs[0], skip_special_tokens=True) print(translation) ``` ## Learning Resources - [YouTube - Coding a Transformer from Scratch on PyTorch](https://youtube.com/your-video-link) A detailed walkthrough of coding a Transformer model from scratch using PyTorch, including training and inference. ## Acknowledgements Special thanks to **Umar Jamil** for his guidance and contributions that supported the completion of this project.