Alex Birch commited on
Commit
5e22649
1 Parent(s): b8b279c

initial commit

Browse files
README.md CHANGED
@@ -1,3 +1,131 @@
1
  ---
2
  license: other
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
  ---
4
+
5
+ # WD 1.5 Beta 3 (Diffusers-compatible)
6
+
7
+ <img height="256px" src="https://birchlabs.co.uk/share/reimu-radiance.smol.jpg" title="Reimu in 'radiance' aesthetic"> <img height="256px" src="https://birchlabs.co.uk/share/sanae-radiance.smol.jpg" title="Sanae in 'radiance' aesthetic"> <img height="256px" src="https://birchlabs.co.uk/share/flandre-radiance.smol.jpg" title="Flandre in 'radiance' aesthetic">
8
+
9
+ This unofficial repository hosts diffusers-compatible float16 checkpoints of WD 1.5 beta 3.
10
+ Float16 is [all you need](https://twitter.com/Birchlabs/status/1599903883278663681) for inference.
11
+
12
+ ## Usage (via diffusers)
13
+
14
+ ```python
15
+ # make sure you're logged in with `huggingface-cli login`
16
+ from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
17
+ from diffusers.pipelines.stable_diffusion import StableDiffusionPipelineOutput
18
+ import torch
19
+ from torch import Generator, compile
20
+ from PIL import Image
21
+ from typing import List
22
+
23
+ # variant=None
24
+ # variant='ink'
25
+ # variant='mofu'
26
+ variant='radiance'
27
+ # variant='illusion'
28
+ pipe: StableDiffusionPipeline = StableDiffusionPipeline.from_pretrained(
29
+ 'Birchlabs/wd-1-5-beta3-unofficial',
30
+ torch_dtype=torch.float16,
31
+ variant=variant,
32
+ )
33
+ pipe.to('cuda')
34
+ compile(pipe.unet, mode='reduce-overhead')
35
+
36
+ # scheduler args documented here:
37
+ # https://github.com/huggingface/diffusers/blob/0392eceba8d42b24fcecc56b2cc1f4582dbefcc4/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L83
38
+ scheduler = DPMSolverMultistepScheduler.from_config(
39
+ pipe.scheduler.config,
40
+ # sde-dpmsolver++ is very new. if your diffusers version doesn't have it: use 'dpmsolver++' instead.
41
+ algorithm_type='sde-dpmsolver++',
42
+ solver_order=2,
43
+ # solver_type='heun' may give a sharper image. Cheng Lu reckons midpoint is better.
44
+ solver_type='midpoint',
45
+ use_karras_sigmas=True,
46
+ )
47
+ pipe.scheduler = scheduler
48
+
49
+ # WD1.5 was trained on area=896**2 and no side longer than 1152
50
+ sqrt_area=896
51
+ # >1 = portrait
52
+ aspect_ratio = 1.143
53
+ height = int(sqrt_area*aspect_ratio)
54
+ width = sqrt_area**2//height
55
+
56
+ prompt = 'artoria pendragon (fate), reddizen, 1girl, best aesthetic, best quality, blue dress, full body, white shirt, blonde hair, looking at viewer, hair between eyes, floating hair, green eyes, blue ribbon, long sleeves, juliet sleeves, light smile, hair ribbon, outdoors, painting (medium), traditional media'
57
+ negative_prompt = 'lowres, bad anatomy, bad hands, missing fingers, extra fingers, blurry, mutation, deformed face, ugly, bad proportions, monster, cropped, worst quality, jpeg, bad posture, long body, long neck, jpeg artifacts, deleted, bad aesthetic, realistic, real life, instagram'
58
+
59
+ out: StableDiffusionPipelineOutput = pipe.__call__(
60
+ prompt,
61
+ negative_prompt=negative_prompt,
62
+ height=height,
63
+ width=width,
64
+ num_inference_steps=22,
65
+ generator=Generator().manual_seed(1234)
66
+ )
67
+ images: List[Image.Image] = out.images
68
+ img, *_ = images
69
+
70
+ img.save('out_pipe/saber.png')
71
+ ```
72
+
73
+ Should output the following image:
74
+
75
+ <img height="256px" src="https://birchlabs.co.uk/share/saber-radiance.smol.jpg" title="Saber in 'radiance' aesthetic">
76
+
77
+ ## How WD1.5b3 CompVis checkpoint was converted
78
+
79
+ I converted the official [CompVis-style checkpoints](https://huggingface.co/waifu-diffusion/wd-1-5-beta3) using [kohya's converter script](https://github.com/bmaltais/kohya_ss/blob/master/tools/convert_diffusers20_original_sd.py).
80
+
81
+ To convert the five aesthetics: I added [converter support](https://github.com/Birch-san/diffusers-play/commit/b8b3cd31081e18a898d888efa7e13dc2a08908be) for [checkpoint variants](https://huggingface.co/docs/diffusers/using-diffusers/loading#checkpoint-variants).
82
+
83
+ I [commented-out](https://github.com/Birch-san/diffusers-play/blob/b8b3cd31081e18a898d888efa7e13dc2a08908be/src/kohya/library/model_util.py#L869-L874) vae-conversion, because WD 1.5 b3 does not distribute a VAE. Instead it re-uses WD1.4's VAE (checkpoints: [CompVis](https://huggingface.co/hakurei/waifu-diffusion-v1-4) [diffusers](https://huggingface.co/hakurei/waifu-diffusion/tree/main/vae)).
84
+
85
+ I told the converter to [load WD 1.4's VAE](https://github.com/Birch-san/diffusers-play/blob/b8b3cd31081e18a898d888efa7e13dc2a08908be/src/kohya/library/model_util.py#L1065-L1066).
86
+
87
+ I invoked my modified [`scripts/convert_diffusers20_original_sd.py`](https://github.com/Birch-san/diffusers-play/blob/b8b3cd31081e18a898d888efa7e13dc2a08908be/scripts/convert_diffusers20_original_sd.py) like so:
88
+
89
+ ```bash
90
+ python scripts/convert_diffusers20_original_sd.py \
91
+ --fp16 \
92
+ --v2 \
93
+ --unet_use_linear_projection \
94
+ --use_safetensors \
95
+ --reference_model stabilityai/stable-diffusion-2-1 \
96
+ --variant illusion \
97
+ in/wd-1-5-beta3/wd-beta3-base-fp16.safetensors \
98
+ out/wd1-5-b3
99
+ ```
100
+
101
+ Except the "base" aesthetic was a special case, where I didn't pass any `--variant <whatever>` option.
102
+
103
+ ### Why is there a `vae` folder
104
+
105
+ The `vae` folder contains copies of WD 1.4's VAE, to make it easier to load stable-diffusion via a diffusers [pipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines#readme).
106
+ There is nothing special about the per-aesthetic VAE variants I provide. They're all copies of WD1.4's VAE.
107
+
108
+ ## Original model card
109
+
110
+ ![WD 1.5 Radiance](https://i.ibb.co/hYjgvGZ/00160-2195473148.png)
111
+
112
+ For this release, we release five versions of the model:
113
+
114
+ - WD 1.5 Beta3 Base
115
+ - WD 1.5 Radiance
116
+ - WD 1.5 Ink
117
+ - WD 1.5 Mofu
118
+ - WD 1.5 Illusion
119
+
120
+ The WD 1.5 Base model is only intended for training use. For generation, it is recomended to create your own finetunes and loras on top of WD 1.5 Base or use one of the aesthetic models. More information and sample generations for the aesthetic models are in the release notes
121
+
122
+ ### Release Notes
123
+
124
+ https://saltacc.notion.site/WD-1-5-Beta-3-Release-Notes-1e35a0ed1bb24c5b93ec79c45c217f63
125
+ # VAE
126
+ WD 1.5 uses the same VAE as WD 1.4, which can be found here https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/vae/kl-f8-anime2.ckpt
127
+
128
+
129
+ ## License
130
+ WD 1.5 is released under the Fair AI Public License 1.0-SD (https://freedevproject.org/faipl-1.0-sd/). If any derivative of this model is made, please share your changes accordingly. Special thanks to ronsor/undeleted (https://undeleted.ronsor.com/) for help with the license.
131
+
model_index.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "StableDiffusionPipeline",
3
+ "_diffusers_version": "0.17.0.dev0",
4
+ "feature_extractor": [
5
+ null,
6
+ null
7
+ ],
8
+ "requires_safety_checker": null,
9
+ "safety_checker": [
10
+ null,
11
+ null
12
+ ],
13
+ "scheduler": [
14
+ "diffusers",
15
+ "DDIMScheduler"
16
+ ],
17
+ "text_encoder": [
18
+ "transformers",
19
+ "CLIPTextModel"
20
+ ],
21
+ "tokenizer": [
22
+ "transformers",
23
+ "CLIPTokenizer"
24
+ ],
25
+ "unet": [
26
+ "diffusers",
27
+ "UNet2DConditionModel"
28
+ ],
29
+ "vae": [
30
+ "diffusers",
31
+ "AutoencoderKL"
32
+ ]
33
+ }
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "DDIMScheduler",
3
+ "_diffusers_version": "0.17.0.dev0",
4
+ "beta_end": 0.012,
5
+ "beta_schedule": "scaled_linear",
6
+ "beta_start": 0.00085,
7
+ "clip_sample": false,
8
+ "clip_sample_range": 1.0,
9
+ "dynamic_thresholding_ratio": 0.995,
10
+ "num_train_timesteps": 1000,
11
+ "prediction_type": "v_prediction",
12
+ "sample_max_value": 1.0,
13
+ "set_alpha_to_one": false,
14
+ "skip_prk_steps": true,
15
+ "steps_offset": 1,
16
+ "thresholding": false,
17
+ "trained_betas": null
18
+ }
text_encoder/config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CLIPTextModel"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 0,
7
+ "dropout": 0.0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_size": 1024,
11
+ "initializer_factor": 1.0,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 4096,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 77,
16
+ "model_type": "clip_text_model",
17
+ "num_attention_heads": 16,
18
+ "num_hidden_layers": 23,
19
+ "pad_token_id": 1,
20
+ "projection_dim": 512,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.28.1",
23
+ "vocab_size": 49408
24
+ }
text_encoder/model.illusion.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7cb83640524c92b3f7e9052804d79bd2d3c66fa0a2a054b2d320a58cebaaf007
3
+ size 1361597016
text_encoder/model.ink.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a49f0414941adb09c6e5eac92a07f793efe961dc2e774cf7b77099509f7e86b8
3
+ size 1361597016
text_encoder/model.mofu.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5656178376163495522baac1561c33762f732838c789905f8fb21f1dbb5dcaaf
3
+ size 1361597016
text_encoder/model.radiance.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:322e025e4af0ec96cc376bdcf346ae74323ae0da64919592486e26de26f86669
3
+ size 1361597016
text_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e80386019ae1101aa0f841c2f0e62d0353fb583e4398691ccaec518cfd748240
3
+ size 1361597016
tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "!",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": {
4
+ "__type": "AddedToken",
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false
10
+ },
11
+ "clean_up_tokenization_spaces": true,
12
+ "do_lower_case": true,
13
+ "eos_token": {
14
+ "__type": "AddedToken",
15
+ "content": "<|endoftext|>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "errors": "replace",
22
+ "model_max_length": 77,
23
+ "pad_token": "<|endoftext|>",
24
+ "tokenizer_class": "CLIPTokenizer",
25
+ "unk_token": {
26
+ "__type": "AddedToken",
27
+ "content": "<|endoftext|>",
28
+ "lstrip": false,
29
+ "normalized": true,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
unet/config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UNet2DConditionModel",
3
+ "_diffusers_version": "0.17.0.dev0",
4
+ "act_fn": "silu",
5
+ "addition_embed_type": null,
6
+ "addition_embed_type_num_heads": 64,
7
+ "attention_head_dim": [
8
+ 5,
9
+ 10,
10
+ 20,
11
+ 20
12
+ ],
13
+ "block_out_channels": [
14
+ 320,
15
+ 640,
16
+ 1280,
17
+ 1280
18
+ ],
19
+ "center_input_sample": false,
20
+ "class_embed_type": null,
21
+ "class_embeddings_concat": false,
22
+ "conv_in_kernel": 3,
23
+ "conv_out_kernel": 3,
24
+ "cross_attention_dim": 1024,
25
+ "cross_attention_norm": null,
26
+ "down_block_types": [
27
+ "CrossAttnDownBlock2D",
28
+ "CrossAttnDownBlock2D",
29
+ "CrossAttnDownBlock2D",
30
+ "DownBlock2D"
31
+ ],
32
+ "downsample_padding": 1,
33
+ "dual_cross_attention": false,
34
+ "encoder_hid_dim": null,
35
+ "flip_sin_to_cos": true,
36
+ "freq_shift": 0,
37
+ "in_channels": 4,
38
+ "layers_per_block": 2,
39
+ "mid_block_only_cross_attention": null,
40
+ "mid_block_scale_factor": 1,
41
+ "mid_block_type": "UNetMidBlock2DCrossAttn",
42
+ "norm_eps": 1e-05,
43
+ "norm_num_groups": 32,
44
+ "num_class_embeds": null,
45
+ "only_cross_attention": false,
46
+ "out_channels": 4,
47
+ "projection_class_embeddings_input_dim": null,
48
+ "resnet_out_scale_factor": 1.0,
49
+ "resnet_skip_time_act": false,
50
+ "resnet_time_scale_shift": "default",
51
+ "sample_size": 64,
52
+ "time_cond_proj_dim": null,
53
+ "time_embedding_act_fn": null,
54
+ "time_embedding_dim": null,
55
+ "time_embedding_type": "positional",
56
+ "timestep_post_act": null,
57
+ "up_block_types": [
58
+ "UpBlock2D",
59
+ "CrossAttnUpBlock2D",
60
+ "CrossAttnUpBlock2D",
61
+ "CrossAttnUpBlock2D"
62
+ ],
63
+ "upcast_attention": false,
64
+ "use_linear_projection": true
65
+ }
unet/diffusion_pytorch_model.illusion.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:395b558e74d9b60f2aa63100fb2bbaba3a5047394fa94ea88b79297462e87170
3
+ size 3463726504
unet/diffusion_pytorch_model.ink.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:071a70600da4d4098a7aa027cf481a55ae178b71f7d8afc9fd8804e0fae8a5ba
3
+ size 3463726504
unet/diffusion_pytorch_model.mofu.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60e018978d8f7ed700a2330b3ea6760a57ebea6a3840ab78d5293586bdad21fa
3
+ size 3463726504
unet/diffusion_pytorch_model.radiance.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec38f83a1315990037bdd4cf842bbdeceef9ea580d9c9e2c9d80637e246564a9
3
+ size 3463726504
unet/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d478c7f837e86d6777379d998726c5132f40a52bb302117511181fc073551f0
3
+ size 3463726504
vae/config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.17.0.dev0",
4
+ "_name_or_path": "hakurei/waifu-diffusion",
5
+ "act_fn": "silu",
6
+ "block_out_channels": [
7
+ 128,
8
+ 256,
9
+ 512,
10
+ 512
11
+ ],
12
+ "down_block_types": [
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D",
16
+ "DownEncoderBlock2D"
17
+ ],
18
+ "in_channels": 3,
19
+ "latent_channels": 4,
20
+ "layers_per_block": 2,
21
+ "norm_num_groups": 32,
22
+ "out_channels": 3,
23
+ "sample_size": 512,
24
+ "scaling_factor": 0.18215,
25
+ "up_block_types": [
26
+ "UpDecoderBlock2D",
27
+ "UpDecoderBlock2D",
28
+ "UpDecoderBlock2D",
29
+ "UpDecoderBlock2D"
30
+ ]
31
+ }
vae/diffusion_pytorch_model.illusion.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e91305cd81f7d1387694a231b63de771190b3f2c25591eb71a8c2525cb08397
3
+ size 334643268
vae/diffusion_pytorch_model.ink.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e91305cd81f7d1387694a231b63de771190b3f2c25591eb71a8c2525cb08397
3
+ size 334643268
vae/diffusion_pytorch_model.mofu.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e91305cd81f7d1387694a231b63de771190b3f2c25591eb71a8c2525cb08397
3
+ size 334643268
vae/diffusion_pytorch_model.radiance.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e91305cd81f7d1387694a231b63de771190b3f2c25591eb71a8c2525cb08397
3
+ size 334643268
vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e91305cd81f7d1387694a231b63de771190b3f2c25591eb71a8c2525cb08397
3
+ size 334643268