Text2Face-LoRa

This repository provides the code for a LoRa-finetuned version of the Stable Diffusion 2.1 model specifically optimized for generating face images. The package includes both training and inference capabilities, along with a pretrained model and the synthetic annotations used for finetuning.

Features

Finetuning Script: finetune.py applies LoRa adjustments to both the UNet denoiser and the text encoder of Stable Diffusion.
Inference Script: generate.py Ready-to-use script for generating images using the pretrained model.
Pretrained Model: download.py downloads our pretrained model from Hugging Face.

Environment Setup

Set up a conda environment to run the model using the following commands:

conda create -n text2face
conda activate text2face

# Install requirements
pip install -r requirements.txt

Checkpoints

You can download the pretrained LoRa weights for the diffusion model and text encoder using our provided Python script download.py

from huggingface_hub import hf_hub_download

hf_hub_download(repo_id="michaeltrs/text2face", filename="checkpoints/lora30k/pytorch_lora_weights.safetensors", local_dir="./test")

Inference

Generate images using the generate.py script, which loads the SD2.1 foundation model from Hugging Face and applies the LoRa weights. Generation is driven by defining a prompt and optionally a negative prompt.

Finetuning

Use finetune.py to finetune a stable diffusion model using LoRAs for the UNet denoiser and the text encoder. Example command for training:

accelerate config
accelerate config default

export MODEL_NAME="stabilityai/stable-diffusion-2-1"
export TRAIN_DIR="<root dir for training data>"

accelerate launch  finetune_lora.py --pretrained_model_name_or_path=$MODEL_NAME   \
--train_data_dir=$TRAIN_DIR   \
--train_text_encoder   \
--checkpointing_steps 5000   \
--resolution=768   \
--center_crop   \
--train_batch_size=4  \
--num_train_epochs 20   \
--gradient_accumulation_steps=1  \
--gradient_checkpointing  \
--num_validation_images 5  \
--learning_rate=1e-05  \
--learning_rate_text_encoder=1e-05 \
--max_grad_norm=1  \
--rank 8  \
--text_encoder_rank 8 \
--lr_scheduler="constant" \
--lr_warmup_steps=0  \
--output_dir="<output directory for trained model>"  \
--resume_from_checkpoint "latest" \
--validation_prompts "A young Latina woman, around 27 years old, with long hair and pale skin, expressing a mix of happiness and neutral emotions. She has fully open eyes and arched eyebrows." "The person is a 44-year-old Asian male with gray hair and a receding hairline. He has a big nose, closed mouth and is feeling a mix of anger and sadness." "A Latino Hispanic male, 22 years old, with straight hair, an oval face, and eyes fully open. His emotion is sad and partly neutral." "A white male, 28 years old, with a neutral emotion, sideburns, pale skin, little hair, an attractive appearance, a 5 o'clock shadow, and pointy nose." "A young, black, female individual with an oval face and big eyes, with a happy and partly surprised expression."

Datasets

Details on the dataset format and preparation will be available soon.