--- license: mit datasets: - allenai/c4 language: - en library_name: transformers pipeline_tag: text-generation base_model: - anto18671/lumenspark --- # Linformer-based Language Model Efficient language modeling optimized for long sequences using the Linformer architecture. This model reduces memory and computational overhead, making it ideal for various text generation tasks. ## Table of Contents - [Introduction](#introduction) - [Architecture](#architecture) - [Installation](#installation) - [Quick Start](#quick-start) - [Inference Parameters](#inference-parameters) - [Hyperparameters](#hyperparameters) - [Training Progress](#training-progress) - [Sponsorship](#sponsorship) - [License](#license) ## Introduction The **Linformer-based Language Model** leverages the Linformer architecture to efficiently handle long sequences in text generation and other language tasks. By optimizing the self-attention mechanism, this model maintains high performance while reducing resource consumption, making it suitable for applications like text completion and generation. ## Architecture Built upon the **Linformer Transformer**, the model incorporates several key innovations: 1. **Efficient Attention**: Reduces self-attention complexity from quadratic to linear by projecting the attention matrix into a lower-dimensional space. 2. **Low-Rank Linear Projections**: Utilizes LowRankLinear layers to decrease dimensionality without compromising expressiveness. 3. **Self-Attention Mechanism**: Implements multi-head self-attention with full expressivity by avoiding low-rank projections in this module. 4. **Factorized Feed-Forward Layers**: Uses factorized LowRankLinear layers in the Feed-Forward Neural Network to maintain performance with fewer parameters. 5. **PreNorm with LayerNorm and LayerScale**: Applies Layer Normalization before attention and feed-forward layers, enhanced with LayerScale for better gradient flow and stability. 6. **Dropout & Residual Connections**: Incorporates dropout for regularization and residual connections to aid in gradient flow and prevent vanishing gradients. ## Installation Install the `lumenspark` package via pip: ```bash pip install lumenspark ``` This command installs the Linformer-based language model along with all necessary dependencies. ## Training Progress Below is the training loss plot that shows the progress made during the model training process: ![Training Loss Plot](assets/training_loss_plot.png) ## Quick Start Load the pre-trained model and tokenizer from Hugging Face to perform text generation: ```python from lumenspark import LumensparkModel import torch # 1. Set up the device (GPU if available, else CPU) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"Using device: {device}") # 2. Load the model and move it to the device model = LumensparkModel.from_pretrained("anto18671/lumenspark").to(device) # 3. Example input text input_text = "Once upon a time" # 4. Generate text output_text = model.generate( input_text, max_length=100, # Maximum length of the generated sequence temperature=0.7, # Controls randomness in predictions top_k=50, # Top-k sampling to filter high-probability tokens top_p=0.9, # Nucleus sampling to control diversity repetition_penalty=1.2 # Penalize repetition ) # 5. Print the generated text print(output_text) ``` ## Inference Parameters Customize text generation using the following parameters: - **`max_length`**: Maximum length of the generated sequence. - **`temperature`**: Controls randomness (lower = more deterministic). - **`top_k`**: Limits sampling to top `k` tokens. - **`top_p`**: Nucleus sampling based on cumulative probability `p`. - **`repetition_penalty`**: Penalizes repeated tokens or phrases. - **`no_repeat_ngram_size`**: Prevents repeated n-grams of specified size. ## Hyperparameters Optimized for performance and efficiency: - **`vocab_size`**: 50,257 - **`embed_dim`**: 768 - **`depth`**: 8 layers - **`heads`**: 8 attention heads - **`seq_length`**: 768 tokens - **`dropout`**: 1/17 - **`k`**: 384 (attention projection) - **`rank`**: 256 (low-rank projections) ## Acknowledgements We would like to extend our gratitude to [RunPod](https://www.runpod.io) for their generous sponsorship, supporting the training and development of Lumenspark. Their contribution has been instrumental in pushing the project forward. ![RunPod Logo](assets/RunPod.webp) ## Sponsorship Support the ongoing development of Lumenspark! ### How to Sponsor Visit [GitHub Sponsors](https://github.com/sponsors/anto18671) and choose a sponsorship tier that suits you. Thank you for your support! ## License This project is licensed under the [MIT License](LICENSE).