File size: 2,545 Bytes
7bb5834
 
 
 
 
 
 
 
 
cb7acfd
7bb5834
6315993
7bb5834
 
 
 
 
 
9724dc1
7bb5834
9724dc1
7bb5834
6315993
7bb5834
61f1bba
 
 
 
 
 
 
7bb5834
 
 
 
 
 
2d8b520
 
 
6315993
 
da5517b
6315993
 
 
37c57c4
7bb5834
 
6720d99
 
 
 
7bb5834
 
cb7acfd
7bb5834
 
 
6315993
 
9724dc1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
license: creativeml-openrail-m
library_name: diffusers
tags:
- text-to-image
- dreambooth
- diffusers-training
- stable-diffusion
- stable-diffusion-diffusers
base_model: runwayml/stable-diffusion-v1-5
inference: true
instance_prompt: disney style
---

<!-- This model card has been generated automatically according to the information the training script had access to. You
should probably proofread and complete it, then remove this comment. -->


# Cartoonify

This is a dreambooth model derived from `runwayml/stable-diffusion-v1-5` with additional fine-tuning of the text encoder. The weights were trained from a popular animation studio using [DreamBooth](https://dreambooth.github.io/). Use the tokens **_disney style_** in your prompts for the effect.

You can find some example images below:

<p float="left">
    <img width=256 height=256 src="./images/king.png">
    <img width=256 height=256 src="./images/legend_of_zelda.png">
    <img width=256 height=256 src="./images/pony.png">
    <img width=256 height=256 src="./images/princess.png">
    <img width=256 height=256 src="./images/red_ferrari.png">
</p>

## Intended uses & limitations

#### How to use

```python
import torch
from diffusers import StableDiffusionPipeline

# basic usage
repo_id = "lavaman131/cartoonify"
device = torch.device("cuda")
torch_dtype = torch.float16 if device.type in ["mps", "cuda"] else torch.float32
pipeline = StableDiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch_dtype).to(device)
image = pipeline("PROMPT GOES HERE").images[0]
image.save("output.png")
```

#### Full source code

The full source-code used for training can be found [here](https://github.com/lavaman131/cartoonify).

#### Limitations and bias

As with any diffusion model, playing around with the prompt and classifier-free guidance parameter is required until you get the results you want. Zoomed-out subjects seem to loose clairity in the face. For additional safety in image generation, we use the Stable Diffusion safety checker.

## Training details

The model was fine-tuned for 3500 steps on around 200 images of modern Disney characters, backgrounds, and animals. The ratios for each were 70%, 20%, and 10% respectively on an RTX A5000 GPU (24GB VRAM).

The training code used can be found [here](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py). The regularization images used for training can be found [here](https://github.com/aitrepreneur/SD-Regularization-Images-Style-Dreambooth/tree/main/style_ddim).