IP-Adapter
IP-Adapter is a lightweight adapter that enables prompting a diffusion model with an image. This method decouples the cross-attention layers of the image and text features. The image features are generated from an image encoder.
Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter loading guide, and you can see how to use it in the usage guide.
IPAdapterMixin
Mixin for handling IP Adapters.
load_ip_adapter
< source >( pretrained_model_name_or_path_or_dict: typing.Union[str, typing.List[str], typing.Dict[str, torch.Tensor]] subfolder: typing.Union[str, typing.List[str]] weight_name: typing.Union[str, typing.List[str]] image_encoder_folder: typing.Optional[str] = 'image_encoder' **kwargs )
Parameters
- pretrained_model_name_or_path_or_dict (
str
orList[str]
oros.PathLike
orList[os.PathLike]
ordict
orList[dict]
) — Can be either:- A string, the model id (for example
google/ddpm-celebahq-256
) of a pretrained model hosted on the Hub. - A path to a directory (for example
./my_model_directory
) containing the model weights saved with ModelMixin.save_pretrained(). - A torch state dict.
- A string, the model id (for example
- subfolder (
str
orList[str]
) — The subfolder location of a model file within a larger model repository on the Hub or locally. If a list is passed, it should have the same length asweight_name
. - weight_name (
str
orList[str]
) — The name of the weight file to load. If a list is passed, it should have the same length assubfolder
. - image_encoder_folder (
str
, optional, defaults toimage_encoder
) — The subfolder location of the image encoder within a larger model repository on the Hub or locally. PassNone
to not load the image encoder. If the image encoder is located in a folder insidesubfolder
, you only need to pass the name of the folder that contains image encoder weights, e.g.image_encoder_folder="image_encoder"
. If the image encoder is located in a folder other thansubfolder
, you should pass the path to the folder that contains image encoder weights, for example,image_encoder_folder="different_subfolder/image_encoder"
. - cache_dir (
Union[str, os.PathLike]
, optional) — Path to a directory where a downloaded pretrained model configuration is cached if the standard cache is not used. - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. - proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, for example,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request. - local_files_only (
bool
, optional, defaults toFalse
) — Whether to only load local model weights and configuration files or not. If set toTrue
, the model won’t be downloaded from the Hub. - token (
str
or bool, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, the token generated fromdiffusers-cli login
(stored in~/.huggingface
) is used. - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier allowed by Git. - low_cpu_mem_usage (
bool
, optional, defaults toTrue
if torch version >= 1.9.0 elseFalse
) — Speed up model loading only loading the pretrained weights and not initializing the weights. This also tries to not use more than 1x model size in CPU memory (including peak memory) while loading the model. Only supported for PyTorch >= 1.9.0. If you are using an older version of PyTorch, setting this argument toTrue
will raise an error.
Set IP-Adapter scales per-transformer block. Input scale
could be a single config or a list of configs for
granular control over each IP-Adapter behavior. A config can be a float or a dictionary.
Example:
# To use original IP-Adapter
scale = 1.0
pipeline.set_ip_adapter_scale(scale)
# To use style block only
scale = {
"up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)
# To use style+layout blocks
scale = {
"down": {"block_2": [0.0, 1.0]},
"up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)
# To use style and layout from 2 reference images
scales = [{"down": {"block_2": [0.0, 1.0]}}, {"up": {"block_0": [0.0, 1.0, 0.0]}}]
pipeline.set_ip_adapter_scale(scales)
Unloads the IP Adapter weights
IPAdapterMaskProcessor
class diffusers.image_processor.IPAdapterMaskProcessor
< source >( do_resize: bool = True vae_scale_factor: int = 8 resample: str = 'lanczos' do_normalize: bool = False do_binarize: bool = True do_convert_grayscale: bool = True )
Parameters
- do_resize (
bool
, optional, defaults toTrue
) — Whether to downscale the image’s (height, width) dimensions to multiples ofvae_scale_factor
. - vae_scale_factor (
int
, optional, defaults to8
) — VAE scale factor. Ifdo_resize
isTrue
, the image is automatically resized to multiples of this factor. - resample (
str
, optional, defaults tolanczos
) — Resampling filter to use when resizing the image. - do_normalize (
bool
, optional, defaults toFalse
) — Whether to normalize the image to [-1,1]. - do_binarize (
bool
, optional, defaults toTrue
) — Whether to binarize the image to 0/1. - do_convert_grayscale (
bool
, optional, defaults to beTrue
) — Whether to convert the images to grayscale format.
Image processor for IP Adapter image masks.
downsample
< source >( mask: Tensor batch_size: int num_queries: int value_embed_dim: int ) → torch.Tensor
Parameters
- mask (
torch.Tensor
) — The input mask tensor generated withIPAdapterMaskProcessor.preprocess()
. - batch_size (
int
) — The batch size. - num_queries (
int
) — The number of queries. - value_embed_dim (
int
) — The dimensionality of the value embeddings.
Returns
torch.Tensor
The downsampled mask tensor.
Downsamples the provided mask tensor to match the expected dimensions for scaled dot-product attention. If the aspect ratio of the mask does not match the aspect ratio of the output image, a warning is issued.