HighCWu's picture
Update README.md
bec9d0d verified
metadata
base_model: stabilityai/stable-diffusion-xl-base-1.0
library_name: diffusers
license: creativeml-openrail-m
tags:
  - stable-diffusion
  - stable-diffusion-diffusers
  - text-to-image
  - diffusers
  - controlnet
  - control-lora-v3
  - diffusers-training
inference: true

sdxl-control-lora-v3-canny

These are control-lora-v3 weights trained on stabilityai/stable-diffusion-xl-base-1.0 with new type of conditioning. You can find some example images below.

prompt: portrait of a beautiful winged goddess with horns, long wavy black hair, long black dress with silver jewels by tom bagshaw images_0) prompt: an emo portrait painting. short dark brown messy pixie haircut, large black eyes, antichrist eyes, slightly rounded face, pointed chin, thin lips, small nose, black tank top, black leather jacket, black knee - length skirt, black choker, gold earring, by peter mohrbacher, by rebecca guay, by ron spencer images_1) prompt: a photograph of a futuristic street scene, brutalist style, straight edges, finely detailed oil painting, impasto brush strokes, soft light, 8 k, dramatic composition, dramatic lighting, sharp focus, octane render, masterpiece, by adrian ghenie and jenny saville and zhang jingna images_2) prompt: portrait of a dancing eagle woman, beautiful blonde haired lakota sioux goddess, intricate, highly detailed art by james jean, ray tracing, digital painting, artstation, concept art, smooth, sharp focus, illustration, artgerm and greg rutkowski and alphonse mucha, vladimir kush, giger, roger dean, 8 k images_3)

Intended uses & limitations

How to use

First clone the control-lora-v3 and cd in the directory:

git clone https://github.com/HighCWu/control-lora-v3
cd control-lora-v3

Then run the python code:

# !pip install opencv-python transformers accelerate
from diffusers import AutoencoderKL
from diffusers.utils import load_image
from model import UNet2DConditionModelEx
from pipeline_sdxl import StableDiffusionXLControlLoraV3Pipeline
import numpy as np
import torch

import cv2
from PIL import Image

prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
negative_prompt = "low quality, bad quality, sketches"

# download an image
image = load_image(
    "https://hf.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
)

# initialize the models and pipeline
unet: UNet2DConditionModelEx = UNet2DConditionModelEx.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", subfolder="unet", torch_dtype=torch.float16
)
unet = unet.add_extra_conditions(["canny"])
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlLoraV3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", unet=unet, vae=vae, torch_dtype=torch.float16
)
# load attention processors
pipe.load_lora_weights("HighCWu/sdxl-control-lora-v3-canny")
pipe.enable_model_cpu_offload()

# get canny image
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

# generate image
image = pipe(
    prompt, image=canny_image
).images[0]
image.show()

Limitations and bias

[TODO: provide examples of latent issues and potential remediations]

Training details

[TODO: describe the data used to train the model]