Introduction
The λ-ECLIPSE model is a light weight support for multi-concept personalization. λ-ECLIPSE is tiny T2I prior model designed for Kandinsky v2.2 diffusion image generator.
λ-ECLIPSE model extends the ECLIPSE-Prior via incorporating the image-text interleaved data.
λ-ECLIPSE shows that we do not need to train the Personalized T2I (P-T2I) models on lot of resources. For instance, λ-ECLIPSE is trained on mere 74 GPU Hours (A100) compared to it's couterparts BLIP-Diffusion (2304 GPU hours) and Kosmos-G (12300 GPU hours).
- Project Page: https://eclipse-t2i.github.io/Lambda-ECLIPSE/
- GitHub: https://github.com/Maitreyapatel/lambda-eclipse-inference
- Paper (arXiv): https://arxiv.org/abs/2402.05195
Importantly, λ-ECLIPSE works in pure CLIP latent space without any additional information. Hence, it's performance can be easily imporved via test-time adaption to increase the concept alignment while having solid composition alignment.
More examples at: Gallery
Installation
git clone https://github.com/eclipse-t2i/lambda-eclipse-inference.git
conda create -p ./venv python=3.9
pip install -r requirements.txt
Run Inference
import os
import torch
from transformers import (
CLIPTextModelWithProjection,
CLIPTokenizer,
)
from src.pipelines.pipeline_kandinsky_subject_prior import KandinskyPriorPipeline
from src.priors.lambda_prior_transformer import PriorTransformer
from diffusers import DiffusionPipeline
text_encoder = CLIPTextModelWithProjection.from_pretrained(
"laion/CLIP-ViT-bigG-14-laion2B-39B-b160k",
projection_dim=1280,
torch_dtype=torch.float32,
)
tokenizer = CLIPTokenizer.from_pretrained("laion/CLIP-ViT-bigG-14-laion2B-39B-b160k")
prior = PriorTransformer.from_pretrained("ECLIPSE-Community/Lambda-ECLIPSE-Prior-v1.0")
pipe_prior = KandinskyPriorPipeline.from_pretrained(
"kandinsky-community/kandinsky-2-2-prior",
prior=prior,
text_encoder=text_encoder,
tokenizer=tokenizer,
).to("cuda")
pipe = DiffusionPipeline.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder"
).to("cuda")
raw_data = {
"prompt": args.prompt,
"subject_images": [args.subject1_path, args.subject2_path],
"subject_keywords": [args.subject1_name, args.subject2_name]
}
image_emb, negative_image_emb = pipe_prior(
raw_data=raw_data,
).to_tuple()
image = pipe(
image_embeds=image_emb,
negative_image_embeds=negative_image_emb,
num_inference_steps=50,
guidance_scale=7.5,
).images
image[0]
Important Notes (and limitations):
- λ-ECLIPSE is trained to support upto four unique concepts, however, this version is trained on biased datasets heavily focusing on single and two subjects. Therefore, it maynot perform expectadly as number of subjects increases.
- As this model is trained for P-T2I specifically, it might not perform well on traditional T2I task.
- λ-ECLIPSE achieves SOTA compositional performance on composition alignment while maintaining the concept alignment. However, there is still a big gap compared to the finetuning based methodologies.
- Downloads last month
- 22