cartoonify / README.md
lavaman131's picture
Update README.md
6720d99 verified
|
raw
history blame
2.55 kB
metadata
license: creativeml-openrail-m
library_name: diffusers
tags:
  - text-to-image
  - dreambooth
  - diffusers-training
  - stable-diffusion
  - stable-diffusion-diffusers
base_model: runwayml/stable-diffusion-v1-5
inference: true
instance_prompt: disney style

Cartoonify

This is a dreambooth model derived from runwayml/stable-diffusion-v1-5 with additional fine-tuning of the text encoder. The weights were trained from a popular animation studio using DreamBooth. Use the tokens disney style in your prompts for the effect.

You can find some example images below:

Intended uses & limitations

How to use

import torch
from diffusers import StableDiffusionPipeline

# basic usage
repo_id = "lavaman131/cartoonify"
device = torch.device("cuda")
torch_dtype = torch.float16 if device.type in ["mps", "cuda"] else torch.float32
pipeline = StableDiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch_dtype).to(device)
image = pipeline("PROMPT GOES HERE").images[0]
image.save("output.png")

Full source code

The full source-code used for training can be found here.

Limitations and bias

As with any diffusion model, playing around with the prompt and classifier-free guidance parameter is required until you get the results you want. Zoomed-out subjects seem to loose clairity in the face. For additional safety in image generation, we use the Stable Diffusion safety checker.

Training details

The model was fine-tuned for 3500 steps on around 200 images of modern Disney characters, backgrounds, and animals. The ratios for each were 70%, 20%, and 10% respectively on an RTX A5000 GPU (24GB VRAM).

The training code used can be found here. The regularization images used for training can be found here.