license: mit
datasets:
- allenai/c4
language:
- en
library_name: transformers
pipeline_tag: text-generation
base_model:
- anto18671/lumenspark
Linformer-based Language Model
Efficient language modeling optimized for long sequences using the Linformer architecture. This model reduces memory and computational overhead, making it ideal for various text generation tasks.
Table of Contents
- Introduction
- Architecture
- Installation
- Quick Start
- Inference Parameters
- Hyperparameters
- Training Progress
- Sponsorship
- License
Introduction
The Linformer-based Language Model leverages the Linformer architecture to efficiently handle long sequences in text generation and other language tasks. By optimizing the self-attention mechanism, this model maintains high performance while reducing resource consumption, making it suitable for applications like text completion and generation.
Architecture
Built upon the Linformer Transformer, the model incorporates several key innovations:
- Efficient Attention: Reduces self-attention complexity from quadratic to linear by projecting the attention matrix into a lower-dimensional space.
- Low-Rank Linear Projections: Utilizes LowRankLinear layers to decrease dimensionality without compromising expressiveness.
- Self-Attention Mechanism: Implements multi-head self-attention with full expressivity by avoiding low-rank projections in this module.
- Factorized Feed-Forward Layers: Uses factorized LowRankLinear layers in the Feed-Forward Neural Network to maintain performance with fewer parameters.
- PreNorm with LayerNorm and LayerScale: Applies Layer Normalization before attention and feed-forward layers, enhanced with LayerScale for better gradient flow and stability.
- Dropout & Residual Connections: Incorporates dropout for regularization and residual connections to aid in gradient flow and prevent vanishing gradients.
Installation
Install the lumenspark
package via pip:
pip install lumenspark
This command installs the Linformer-based language model along with all necessary dependencies.
Training Progress
Below is the training loss plot that shows the progress made during the model training process:
Quick Start
Load the pre-trained model and tokenizer from Hugging Face to perform text generation:
from lumenspark import LumensparkModel
import torch
# 1. Set up the device (GPU if available, else CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
# 2. Load the model and move it to the device
model = LumensparkModel.from_pretrained("anto18671/lumenspark").to(device)
# 3. Example input text
input_text = "Once upon a time"
# 4. Generate text
output_text = model.generate(
input_text,
max_length=100, # Maximum length of the generated sequence
temperature=0.7, # Controls randomness in predictions
top_k=50, # Top-k sampling to filter high-probability tokens
top_p=0.9, # Nucleus sampling to control diversity
repetition_penalty=1.2 # Penalize repetition
)
# 5. Print the generated text
print(output_text)
Inference Parameters
Customize text generation using the following parameters:
max_length
: Maximum length of the generated sequence.temperature
: Controls randomness (lower = more deterministic).top_k
: Limits sampling to topk
tokens.top_p
: Nucleus sampling based on cumulative probabilityp
.repetition_penalty
: Penalizes repeated tokens or phrases.no_repeat_ngram_size
: Prevents repeated n-grams of specified size.
Hyperparameters
Optimized for performance and efficiency:
vocab_size
: 50,257embed_dim
: 768depth
: 8 layersheads
: 8 attention headsseq_length
: 768 tokensdropout
: 1/17k
: 384 (attention projection)rank
: 256 (low-rank projections)
Acknowledgements
We would like to extend our gratitude to RunPod for their generous sponsorship, supporting the training and development of Lumenspark. Their contribution has been instrumental in pushing the project forward.
Sponsorship
Support the ongoing development of Lumenspark!
How to Sponsor
Visit GitHub Sponsors and choose a sponsorship tier that suits you. Thank you for your support!
License
This project is licensed under the MIT License.