現ディレクトリに存在するファイルは、[kohya版Fine Tuning](https://github.com/kohya-ss/sd-scripts)を用いて、設定をSeedを含めて同一の学習を行ったModelと、    
Seed以外が同一の学習を用いたModelを用い、その出力した画像の差異を確認した際に用いたものです。    

# 手順
[DeDeDe](https://huggingface.co/nakayama/DeDeDe)にて示しているDDD_pre2.ckptに、[DeDeDeDataset](https://huggingface.co/datasets/nakayama/DeDeDeDataset)にある画像とキャプションを用い、
以下の設定で学習を行った。    

```
py finetune\prepare_buckets_latents.py train_data meta_clean.json meta_lat.json DDD_pre2.ckpt
  --batch_size 12
  --max_resolution 768,768
  --max_bucket_reso 1280
  --flip_aug
  --mixed_precision no

accelerate launch --num_cpu_threads_per_process 16 fine_tune.py --pretrained_model_name_or_path=DDD_pre2.ckpt
  --in_json meta_lat.json
  --train_data_dir=train_data
  --output_dir=fine_tuned
  --shuffle_caption
  --train_batch_size=4
  --learning_rate=5e-6
  --max_train_steps=60000
  --use_8bit_adam
  --xformers
  --mixed_precision=bf16
  --save_every_n_epochs=1
  --save_precision=float
  --clip_skip=2
  --max_token_length=150
  --seed=42
```

以上の設定からDDTest_last_1st.ckpt、DDTest_last_2nd.ckptを作成、さらに上記の設定からseedを41に変更したDDTest_last_3rd.ckptも用意した。    
DeDeTest_*.ckptは、以上三種のモデルに、[DeDeDe](https://huggingface.co/nakayama/DeDeDe)で示した手順4のマージを行ったものである。    

# 出力比較
<img src="https://huggingface.co/nakayama/DeDeTestModels/resolve/main/FineTuningTest/img/img01.png" style="max-width:400px;" width="75%"/>
<img src="https://huggingface.co/nakayama/DeDeTestModels/resolve/main/FineTuningTest/img/img02.png" style="max-width:400px;" width="75%"/>

```
masterpiece, best quality, masterpiece, asuka langley sitting cross legged on a chair
Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts,signature, watermark, username, blurry, artist name
Steps: 28, Sampler: Euler, CFG scale: 12, Seed: 2870305590, Size: 512x512, Clip skip: 2, ENSD: 31337
```

また、DDTest_last_*.ckptと[DDD_pre3](https://huggingface.co/nakayama/DeDeDe/blob/main/README.md)を用い、下記Promptを用いて画像をそれぞれseed42～10041までの10000枚作成、[DaFID512](https://github.com/birdManIkioiShota/DaFID-512)を用いて各画像集合間の距離を比較した。

```
masterpiece, best quality,detailed anime style of 1girl
Negative prompt: 3d, flat shading, flat color, retro style, 1980s, 1990s, 2000s, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, inaccurate limb
Steps: 5, Sampler: Euler a, CFG scale: 7.5, Seed: 42, Size: 512x512, Clip skip: 2, ENSD: 31337
```

|---|1st|2nd|3rd|DDD_pre3|
|---|:---|:---|:---|:---|
|1st|-|0.0167055713199602|0.2883947343568263|1.0370061922117397|
|2nd|-|-|0.2616578791587312|0.9435170318307868|
|3rd|-|-|-|0.6212258719526709|

# 感想
* 同Seedでも微妙に出力にブレが生じる件については、[ツイッター上でKohyaさんにご指摘いただき](https://twitter.com/nakayama_ukgk/status/1629757752527712256)、オプションの指定、学習環境の再確認で軽減できることを教えていただいた。
* DDD_pre3でのDaFID512の測定が他のモデルと比べて長いのは、これの学習に用いているデータセットが公開前のNSFW疑いのある画像を含んでいたものであるからかと思われる。