|
--- |
|
license: other |
|
license_name: stabilityai-ai-community |
|
license_link: >- |
|
https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md |
|
language: |
|
- en |
|
library_name: diffusers |
|
pipeline_tag: text-to-image |
|
tags: |
|
- Text-to-Image |
|
- IP-Adapter |
|
- StableDiffusion3Pipeline |
|
- image-generation |
|
- Stable Diffusion |
|
base_model: |
|
- stabilityai/stable-diffusion-3.5-large |
|
--- |
|
|
|
# SD3.5-Large-IP-Adapter |
|
|
|
This repository contains a IP-Adapter for SD3.5-Large model released by researchers from [InstantX Team](https://huggingface.co/InstantX), where image work just like text, so it may not be responsive or interfere with other text, but we do hope you enjoy this model, have fun and share your creative works with us [on Twitter](https://x.com/instantx_ai). |
|
|
|
# Model Card |
|
This is a regular IP-Adapter, where the new layers are added into all 38 blocks. We use [google/siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) to encode image for its superior performance, and adopt a TimeResampler to project. The image token number is set to 64. |
|
|
|
# Showcases |
|
|
|
<div class="container"> |
|
<img src="./teasers/0.png" width="1024"/> |
|
<img src="./teasers/1.png" width="1024"/> |
|
</div> |
|
|
|
# Inference |
|
The code has not been integrated into diffusers yet, please use our local files at this moment. |
|
```python |
|
import torch |
|
from PIL import Image |
|
|
|
from models.transformer_sd3 import SD3Transformer2DModel |
|
from pipeline_stable_diffusion_3_ipa import StableDiffusion3Pipeline |
|
|
|
model_path = 'stabilityai/stable-diffusion-3.5-large' |
|
ip_adapter_path = './ip-adapter.bin' |
|
image_encoder_path = "google/siglip-so400m-patch14-384" |
|
|
|
transformer = SD3Transformer2DModel.from_pretrained( |
|
model_path, subfolder="transformer", torch_dtype=torch.bfloat16 |
|
) |
|
|
|
pipe = StableDiffusion3Pipeline.from_pretrained( |
|
model_path, transformer=transformer, torch_dtype=torch.bfloat16 |
|
).to("cuda") |
|
|
|
pipe.init_ipadapter( |
|
ip_adapter_path=ip_adapter_path, |
|
image_encoder_path=image_encoder_path, |
|
nb_token=64, |
|
) |
|
|
|
ref_img = Image.open('./assets/1.jpg').convert('RGB') |
|
|
|
# please note that SD3.5 Large is sensitive to highres generation like 1536x1536 |
|
image = pipe( |
|
width=1024, |
|
height=1024, |
|
prompt='a cat', |
|
negative_prompt="lowres, low quality, worst quality", |
|
num_inference_steps=24, |
|
guidance_scale=5.0, |
|
generator=torch.Generator("cuda").manual_seed(42), |
|
clip_image=ref_img, |
|
ipadapter_scale=0.5, |
|
).images[0] |
|
image.save('./result.jpg') |
|
``` |
|
|
|
# License |
|
The model is released under [stabilityai-ai-community](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md). All copyright reserved. |
|
|
|
# Acknowledgements |
|
This project is sponsored by [HuggingFace](https://huggingface.co/) and [fal.ai](https://fal.ai/). |
|
|
|
# Citation |
|
If you find this project useful in your research, please cite us via |
|
``` |
|
@misc{sd35-large-ipa, |
|
author = {InstantX Team}, |
|
title = {InstantX SD3.5-Large IP-Adapter Page}, |
|
year = {2024}, |
|
} |
|
``` |
|
|