Stable Diffusion 1.5 fine tuned VAE decoder for better pixel art generation by aliasing the output of the decoder.
Fine tuning was done by training 50 thousand images for 1 epoch effective batch size 12. I preprocessed the images to quantize each 8x8 tile to its average color. On a RTX3090, this took about 4 hours to fine-tune. Used only MSE loss at 1e-5 learning rate. The training data set was just generated from other stable diffusion models, mostly cartoon-like images.