# Text2Face-LoRa ![Python version](https://img.shields.io/badge/python-3.8+-blue.svg) ![License](https://img.shields.io/badge/license-MIT-green) This repository provides the code for a LoRa-finetuned version of the Stable Diffusion 2.1 model specifically optimized for generating face images. The package includes both training and inference capabilities, along with a pretrained model and the synthetic annotations used for finetuning. ## Features - **Finetuning Script:** `finetune.py` applies LoRa adjustments to both the UNet denoiser and the text encoder of Stable Diffusion. - **Inference Script:** `generate.py` Ready-to-use script for generating images using the pretrained model. - **Pretrained Model:** `download.py` downloads our pretrained model from Hugging Face. # Environment Setup Set up a conda environment to run the model using the following commands: ```bash conda create -n text2face conda activate text2face # Install requirements pip install -r requirements.txt ``` # Checkpoints You can download the pretrained LoRa weights for the diffusion model and text encoder using our provided Python script `download.py` ```python from huggingface_hub import hf_hub_download hf_hub_download(repo_id="michaeltrs/text2face", filename="checkpoints/lora30k/pytorch_lora_weights.safetensors", local_dir="./test") ``` # Inference Generate images using the `generate.py` script, which loads the SD2.1 foundation model from Hugging Face and applies the LoRa weights. Generation is driven by defining a prompt and optionally a negative prompt. # Finetuning Use `finetune.py` to finetune a stable diffusion model using LoRAs for the UNet denoiser and the text encoder. Example command for training: ```bash accelerate config accelerate config default export MODEL_NAME="stabilityai/stable-diffusion-2-1" export TRAIN_DIR="" accelerate launch finetune_lora.py --pretrained_model_name_or_path=$MODEL_NAME \ --train_data_dir=$TRAIN_DIR \ --train_text_encoder \ --checkpointing_steps 5000 \ --resolution=768 \ --center_crop \ --train_batch_size=4 \ --num_train_epochs 20 \ --gradient_accumulation_steps=1 \ --gradient_checkpointing \ --num_validation_images 5 \ --learning_rate=1e-05 \ --learning_rate_text_encoder=1e-05 \ --max_grad_norm=1 \ --rank 8 \ --text_encoder_rank 8 \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --output_dir="" \ --resume_from_checkpoint "latest" \ --validation_prompts "A young Latina woman, around 27 years old, with long hair and pale skin, expressing a mix of happiness and neutral emotions. She has fully open eyes and arched eyebrows." "The person is a 44-year-old Asian male with gray hair and a receding hairline. He has a big nose, closed mouth and is feeling a mix of anger and sadness." "A Latino Hispanic male, 22 years old, with straight hair, an oval face, and eyes fully open. His emotion is sad and partly neutral." "A white male, 28 years old, with a neutral emotion, sideburns, pale skin, little hair, an attractive appearance, a 5 o'clock shadow, and pointy nose." "A young, black, female individual with an oval face and big eyes, with a happy and partly surprised expression." ``` # Datasets Details on the dataset format and preparation will be available soon.