It's raining diffusion personalization techniques☔️🎭🖼️
Recently, generating high quality portraits from refrence photos was made possible with as little as a single reference image & without any optimization⚡️
figure taken from InstantID: Zero-shot Identity-Preserving Generation in Seconds
Using these new zero-shot methods, one can easily generate a self portrait with their choice of style, composition, and background👩🏻🎨
Here are 3 zero-shot pipelines to know and try🚀
- 📗IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
- Code 👩💻
- Demo 🤗
IP Adapters consist of 2 core components:
- An
image encoder
to extract image features (from the reference image/s) - Decoupled cross-attention layers for text features and image features. A new cross-attention layer is added for each cross-attention layer in the original UNet model to insert image features. 💡To improve face fidelity, in IP Adapter FaceID, face embeddings were introduced, instead of (or in addition to in IP Adapter FaceID Plus) to CLIP embeddings.
Similar to IP Adapter, InstantID also makes use of id embeddings and decoupled cross attention, but adds a new component: Identity Net
💡IdentityNet - an adapted ControlNet - meant to encode the detailed features from the reference facial image with additional spatial control, with 2 main modifications to ControlNet:
❶ Instead of fine-grained OpenPose facial keypoints, only five facial keypoints are used (two for the eyes, one for the nose, and two for the mouth) for conditional input.
❷ Eliminate text prompts and use ID embedding as conditions for cross-attention layers in the ControlNet
a diffusers 🧨 workflow inspired by @fofr Face-to-Many ComfyUI workflow🔥
This workflow extends the original InstantID pipeline & combines it with any SDXL LoRA:
- adding the option to stylize with all style sdxl LoRAs - especially useful for styles that aren't known to the base diffusion model (browse the LoRA Studio for inspo ✨)
- improving structure preservation - maintaining the composition of the reference image.