Alex Birch commited on May 16, 2023

Commit

5e22649

•

1 Parent(s): b8b279c

initial commit

Browse files

Files changed (25) hide show

README.md +128 -0
model_index.json +33 -0
scheduler/scheduler_config.json +18 -0
text_encoder/config.json +24 -0
text_encoder/model.illusion.safetensors +3 -0
text_encoder/model.ink.safetensors +3 -0
text_encoder/model.mofu.safetensors +3 -0
text_encoder/model.radiance.safetensors +3 -0
text_encoder/model.safetensors +3 -0
tokenizer/merges.txt +0 -0
tokenizer/special_tokens_map.json +24 -0
tokenizer/tokenizer_config.json +33 -0
tokenizer/vocab.json +0 -0
unet/config.json +65 -0
unet/diffusion_pytorch_model.illusion.safetensors +3 -0
unet/diffusion_pytorch_model.ink.safetensors +3 -0
unet/diffusion_pytorch_model.mofu.safetensors +3 -0
unet/diffusion_pytorch_model.radiance.safetensors +3 -0
unet/diffusion_pytorch_model.safetensors +3 -0
vae/config.json +31 -0
vae/diffusion_pytorch_model.illusion.safetensors +3 -0
vae/diffusion_pytorch_model.ink.safetensors +3 -0
vae/diffusion_pytorch_model.mofu.safetensors +3 -0
vae/diffusion_pytorch_model.radiance.safetensors +3 -0
vae/diffusion_pytorch_model.safetensors +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,131 @@
 ---
 license: other
 ---

 ---
 license: other
 ---
+# WD 1.5 Beta 3 (Diffusers-compatible)
+<img height="256px" src="https://birchlabs.co.uk/share/reimu-radiance.smol.jpg" title="Reimu in 'radiance' aesthetic"> <img height="256px" src="https://birchlabs.co.uk/share/sanae-radiance.smol.jpg" title="Sanae in 'radiance' aesthetic"> <img height="256px" src="https://birchlabs.co.uk/share/flandre-radiance.smol.jpg" title="Flandre in 'radiance' aesthetic">
+This unofficial repository hosts diffusers-compatible float16 checkpoints of WD 1.5 beta 3.
+Float16 is [all you need](https://twitter.com/Birchlabs/status/1599903883278663681)  for inference.
+## Usage (via diffusers)
+```python
+# make sure you're logged in with `huggingface-cli login`
+from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
+from diffusers.pipelines.stable_diffusion import StableDiffusionPipelineOutput
+import torch
+from torch import Generator, compile
+from PIL import Image
+from typing import List
+# variant=None
+# variant='ink'
+# variant='mofu'
+variant='radiance'
+# variant='illusion'
+pipe: StableDiffusionPipeline = StableDiffusionPipeline.from_pretrained(
+  'Birchlabs/wd-1-5-beta3-unofficial',
+  torch_dtype=torch.float16,
+  variant=variant,
+)
+pipe.to('cuda')
+compile(pipe.unet, mode='reduce-overhead')
+# scheduler args documented here:
+# https://github.com/huggingface/diffusers/blob/0392eceba8d42b24fcecc56b2cc1f4582dbefcc4/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py#L83
+scheduler = DPMSolverMultistepScheduler.from_config(
+  pipe.scheduler.config,
+  # sde-dpmsolver++ is very new. if your diffusers version doesn't have it: use 'dpmsolver++' instead.
+  algorithm_type='sde-dpmsolver++',
+  solver_order=2,
+  # solver_type='heun' may give a sharper image. Cheng Lu reckons midpoint is better.
+  solver_type='midpoint',
+  use_karras_sigmas=True,
+)
+pipe.scheduler = scheduler
+# WD1.5 was trained on area=896**2 and no side longer than 1152
+sqrt_area=896
+# >1 = portrait
+aspect_ratio = 1.143
+height = int(sqrt_area*aspect_ratio)
+width = sqrt_area**2//height
+prompt = 'artoria pendragon (fate), reddizen, 1girl, best aesthetic, best quality, blue dress, full body, white shirt, blonde hair, looking at viewer, hair between eyes, floating hair, green eyes, blue ribbon, long sleeves, juliet sleeves, light smile, hair ribbon, outdoors, painting (medium), traditional media'
+negative_prompt = 'lowres, bad anatomy, bad hands, missing fingers, extra fingers, blurry, mutation, deformed face, ugly, bad proportions, monster, cropped, worst quality, jpeg, bad posture, long body, long neck, jpeg artifacts, deleted, bad aesthetic, realistic, real life, instagram'
+out: StableDiffusionPipelineOutput = pipe.__call__(
+  prompt,
+  negative_prompt=negative_prompt,
+  height=height,
+  width=width,
+  num_inference_steps=22,
+  generator=Generator().manual_seed(1234)
+)
+images: List[Image.Image] = out.images
+img, *_ = images
+img.save('out_pipe/saber.png')
+```
+Should output the following image:
+<img height="256px" src="https://birchlabs.co.uk/share/saber-radiance.smol.jpg" title="Saber in 'radiance' aesthetic">
+## How WD1.5b3 CompVis checkpoint was converted
+I converted the official [CompVis-style checkpoints](https://huggingface.co/waifu-diffusion/wd-1-5-beta3) using [kohya's converter script](https://github.com/bmaltais/kohya_ss/blob/master/tools/convert_diffusers20_original_sd.py).
+To convert the five aesthetics: I added [converter support](https://github.com/Birch-san/diffusers-play/commit/b8b3cd31081e18a898d888efa7e13dc2a08908be) for [checkpoint variants](https://huggingface.co/docs/diffusers/using-diffusers/loading#checkpoint-variants).
+I [commented-out](https://github.com/Birch-san/diffusers-play/blob/b8b3cd31081e18a898d888efa7e13dc2a08908be/src/kohya/library/model_util.py#L869-L874) vae-conversion, because WD 1.5 b3 does not distribute a VAE. Instead it re-uses WD1.4's VAE (checkpoints: [CompVis](https://huggingface.co/hakurei/waifu-diffusion-v1-4) [diffusers](https://huggingface.co/hakurei/waifu-diffusion/tree/main/vae)).
+I told the converter to [load WD 1.4's VAE](https://github.com/Birch-san/diffusers-play/blob/b8b3cd31081e18a898d888efa7e13dc2a08908be/src/kohya/library/model_util.py#L1065-L1066).
+I invoked my modified [`scripts/convert_diffusers20_original_sd.py`](https://github.com/Birch-san/diffusers-play/blob/b8b3cd31081e18a898d888efa7e13dc2a08908be/scripts/convert_diffusers20_original_sd.py) like so:
+```bash
+python scripts/convert_diffusers20_original_sd.py \
+--fp16 \
+--v2 \
+--unet_use_linear_projection \
+--use_safetensors \
+--reference_model stabilityai/stable-diffusion-2-1 \
+--variant illusion \
+in/wd-1-5-beta3/wd-beta3-base-fp16.safetensors \
+out/wd1-5-b3
+```
+Except the "base" aesthetic was a special case, where I didn't pass any `--variant <whatever>` option.
+### Why is there a `vae` folder
+The `vae` folder contains copies of WD 1.4's VAE, to make it easier to load stable-diffusion via a diffusers [pipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines#readme).
+There is nothing special about the per-aesthetic VAE variants I provide. They're all copies of WD1.4's VAE.
+## Original model card
+![WD 1.5 Radiance](https://i.ibb.co/hYjgvGZ/00160-2195473148.png)
+For this release, we release five versions of the model:
+  - WD 1.5 Beta3 Base
+  - WD 1.5 Radiance
+  - WD 1.5 Ink
+  - WD 1.5 Mofu
+  - WD 1.5 Illusion
+The WD 1.5 Base model is only intended for training use. For generation, it is recomended to create your own finetunes and loras on top of WD 1.5 Base or use one of the aesthetic models. More information and sample generations for the aesthetic models are in the release notes
+### Release Notes
+https://saltacc.notion.site/WD-1-5-Beta-3-Release-Notes-1e35a0ed1bb24c5b93ec79c45c217f63
+# VAE
+WD 1.5 uses the same VAE as WD 1.4, which can be found here https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/vae/kl-f8-anime2.ckpt
+## License
+WD 1.5 is released under the Fair AI Public License 1.0-SD (https://freedevproject.org/faipl-1.0-sd/). If any derivative of this model is made, please share your changes accordingly. Special thanks to ronsor/undeleted (https://undeleted.ronsor.com/) for help with the license.

model_index.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "_class_name": "StableDiffusionPipeline",
+  "_diffusers_version": "0.17.0.dev0",
+  "feature_extractor": [
+    null,
+    null
+  ],
+  "requires_safety_checker": null,
+  "safety_checker": [
+    null,
+    null
+  ],
+  "scheduler": [
+    "diffusers",
+    "DDIMScheduler"
+  ],
+  "text_encoder": [
+    "transformers",
+    "CLIPTextModel"
+  ],
+  "tokenizer": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}

scheduler/scheduler_config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "_class_name": "DDIMScheduler",
+  "_diffusers_version": "0.17.0.dev0",
+  "beta_end": 0.012,
+  "beta_schedule": "scaled_linear",
+  "beta_start": 0.00085,
+  "clip_sample": false,
+  "clip_sample_range": 1.0,
+  "dynamic_thresholding_ratio": 0.995,
+  "num_train_timesteps": 1000,
+  "prediction_type": "v_prediction",
+  "sample_max_value": 1.0,
+  "set_alpha_to_one": false,
+  "skip_prk_steps": true,
+  "steps_offset": 1,
+  "thresholding": false,
+  "trained_betas": null
+}

text_encoder/config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "architectures": [
+    "CLIPTextModel"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dropout": 0.0,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_size": 1024,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 77,
+  "model_type": "clip_text_model",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 23,
+  "pad_token_id": 1,
+  "projection_dim": 512,
+  "torch_dtype": "float32",
+  "transformers_version": "4.28.1",
+  "vocab_size": 49408
+}

text_encoder/model.illusion.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7cb83640524c92b3f7e9052804d79bd2d3c66fa0a2a054b2d320a58cebaaf007
+size 1361597016

text_encoder/model.ink.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a49f0414941adb09c6e5eac92a07f793efe961dc2e774cf7b77099509f7e86b8
+size 1361597016

text_encoder/model.mofu.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5656178376163495522baac1561c33762f732838c789905f8fb21f1dbb5dcaaf
+size 1361597016

text_encoder/model.radiance.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:322e025e4af0ec96cc376bdcf346ae74323ae0da64919592486e26de26f86669
+size 1361597016

text_encoder/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e80386019ae1101aa0f841c2f0e62d0353fb583e4398691ccaec518cfd748240
+size 1361597016

tokenizer/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "!",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "add_prefix_space": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": true,
+  "do_lower_case": true,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "errors": "replace",
+  "model_max_length": 77,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "CLIPTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

unet/config.json ADDED Viewed

	@@ -0,0 +1,65 @@

+{
+  "_class_name": "UNet2DConditionModel",
+  "_diffusers_version": "0.17.0.dev0",
+  "act_fn": "silu",
+  "addition_embed_type": null,
+  "addition_embed_type_num_heads": 64,
+  "attention_head_dim": [
+    5,
+    10,
+    20,
+    20
+  ],
+  "block_out_channels": [
+    320,
+    640,
+    1280,
+    1280
+  ],
+  "center_input_sample": false,
+  "class_embed_type": null,
+  "class_embeddings_concat": false,
+  "conv_in_kernel": 3,
+  "conv_out_kernel": 3,
+  "cross_attention_dim": 1024,
+  "cross_attention_norm": null,
+  "down_block_types": [
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "DownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "dual_cross_attention": false,
+  "encoder_hid_dim": null,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_only_cross_attention": null,
+  "mid_block_scale_factor": 1,
+  "mid_block_type": "UNetMidBlock2DCrossAttn",
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_class_embeds": null,
+  "only_cross_attention": false,
+  "out_channels": 4,
+  "projection_class_embeddings_input_dim": null,
+  "resnet_out_scale_factor": 1.0,
+  "resnet_skip_time_act": false,
+  "resnet_time_scale_shift": "default",
+  "sample_size": 64,
+  "time_cond_proj_dim": null,
+  "time_embedding_act_fn": null,
+  "time_embedding_dim": null,
+  "time_embedding_type": "positional",
+  "timestep_post_act": null,
+  "up_block_types": [
+    "UpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D"
+  ],
+  "upcast_attention": false,
+  "use_linear_projection": true
+}

unet/diffusion_pytorch_model.illusion.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:395b558e74d9b60f2aa63100fb2bbaba3a5047394fa94ea88b79297462e87170
+size 3463726504

unet/diffusion_pytorch_model.ink.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:071a70600da4d4098a7aa027cf481a55ae178b71f7d8afc9fd8804e0fae8a5ba
+size 3463726504

unet/diffusion_pytorch_model.mofu.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60e018978d8f7ed700a2330b3ea6760a57ebea6a3840ab78d5293586bdad21fa
+size 3463726504

unet/diffusion_pytorch_model.radiance.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec38f83a1315990037bdd4cf842bbdeceef9ea580d9c9e2c9d80637e246564a9
+size 3463726504

unet/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d478c7f837e86d6777379d998726c5132f40a52bb302117511181fc073551f0
+size 3463726504

vae/config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.17.0.dev0",
+  "_name_or_path": "hakurei/waifu-diffusion",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 512,
+  "scaling_factor": 0.18215,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}

vae/diffusion_pytorch_model.illusion.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e91305cd81f7d1387694a231b63de771190b3f2c25591eb71a8c2525cb08397
+size 334643268

vae/diffusion_pytorch_model.ink.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e91305cd81f7d1387694a231b63de771190b3f2c25591eb71a8c2525cb08397
+size 334643268

vae/diffusion_pytorch_model.mofu.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e91305cd81f7d1387694a231b63de771190b3f2c25591eb71a8c2525cb08397
+size 334643268

vae/diffusion_pytorch_model.radiance.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e91305cd81f7d1387694a231b63de771190b3f2c25591eb71a8c2525cb08397
+size 334643268

vae/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e91305cd81f7d1387694a231b63de771190b3f2c25591eb71a8c2525cb08397
+size 334643268