--- license: cc-by-nc-4.0 language: - en tags: - stable cascade --- # Stable-Cascade FP16 fix **A modified version of [Stable-Cascade](https://huggingface.co/stabilityai/stable-cascade) which is compatibile with fp16 inference** ## Demo | FP16| BF16| | - | - | |![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/fkWNY15JQbfh5pe1SY7wS.png)|![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/XpfqkimqJTeDjggTaV4Mt.png)| LPIPS difference: 0.088 | FP16 | BF16| | - | - | |![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/muOkoNjVK6CFv2rs6QyBr.png)|![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/rrgb8yMuJDyjJu6wd366j.png)| LPIPS difference: 0.012 ## How After doing some check to the L1 norm of each hidden state. I found the last block group(8, 24, 24, 8 <- this one) make the hiddens states become bigger and bigger. So I just apply some transformation on the TimestepBlock to directly modify the scale of hidden state. (Since it is not a residual block, so this is possible) How the transformation be done is written in the modified "stable_cascade.py", you can put the file into kohya-ss/sd-scripts' stable-cascade branch and uncomment things to check weights or doing the conversion by yourselve. ### FP8 Some people may know the FP8 quant for inference SDXL with lowvram cards. The technique can be applied to this model too.
But since the last block group is basically ruined, so it is recommend to ignore the last block group:
```python for name, module in generator_c.named_modules(): if "up_blocks.1" in name: continue if isinstance(module, torch.nn.Linear): module.to(torch.float8_e5m2) elif isinstance(module, torch.nn.Conv2d): module.to(torch.float8_e5m2) elif isinstance(module, torch.nn.MultiheadAttention): module.to(torch.float8_e5m2) ``` This sample code should transform 70% of weight into fp8. (Use FP8 weight with scale is better solution, it is recommended to implement that) I have tried different transform settings which is more friendly for FP8 but the differences between original model is more significant. FP8 Demo (Same Seed): ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630593e2fca1d8d92b81d2a1/wPoZeWGGhcPMck45--y_X.png) ## Notice The modified version of model will not be compatibile with the lora/lycoris trained on original weight.
(actually it can, just do the same transformation, I'm considering to rewrite a version to use key name to determine what to do.) Also the ControlNets will not be compatible too. Unless you also apply the needed transformation to them. I don't want to do all of these by myself so hope some others will do that. ## License Stable-Cascade is published with a non-commercial lisence so I use CC-BY-NC 4.0 to publish this model. **The source code to make this model is published with apache-2.0 license**