Tests on larger image dimensions?

#26
by n-Arno - opened

Most SDXL derived models can now generate larger images directly and i regularly generate 832x1216 images.

Using your fixed VAE together with stablediffusion.cpp yield strange "bar effects" on the images (see attached examples, i removed all possible other explanation like quantization and upscaling to make sure your VAE is the only impact, the only "thing" i still do is using tiled decoding of the latent image).

Is it possible the weights changes impact larger images and you didn't see it in your tests? (Model used is derived from https://civitai.com/models/257749/pony-diffusion-v6-xl)

Or should i look elsewhere for an explanation?

2024-11-14.09-55.png
2024-11-14.10-20.png

Hmm, it's technically possible! I ran finetuning at lower resolution and only did very brief testing at higher-res before release.

That said, the bar effect appears to be specifically localized at y = 1024px or so, which is not what I would have expected from a VAE bug (VAE bugs would usually manifest as an issue with global color/contrast or local texture). The bar looks much more like what I would expect from a mistake in the tiled decoding code.

I ran a quick test in Diffusers using animaginexl (couldn't figure out how to load ponyv6) and wasn't able to reproduce the bar effect.
https://colab.research.google.com/drive/1eU_qLsf2Ipcmb7A9pQlXdwx-34LT7jLo?usp=sharing

image.png

Questions whose answers would help debug further:

  1. Are you able to reproduce this effect using Diffusers or any other codebases?
  2. Are you able to run stable-diffusion.cpp using the SDXL 0.9 VAE at fp32 precision with tiled decoding, and does it avoid this issue?
  3. Are you able to save the latents pre-decode somehow (as a safetensors file, fp16 binary blob, whatever)? This would make it easy to load the exact latents in colab and then check decoded results with both VAEs.

Thanks for the quick answer! I was using the CLI from stablediffusion.cpp which does not provides a lot of flexibility in the pipeline, i'll try by calling the DLL directly to see if i can get more data.

(To be honest, i have since tested it in automatic1111/forge, i didn't get the problem, so your hint that it may come from tiled decoding is probably right)

Sign up or log in to comment