Attention Processor
An attention processor is a class for applying different types of attention mechanisms.
AttnProcessor
Default processor for performing attention-related computations.
AttnProcessor2_0
Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0).
AttnAddedKVProcessor
Processor for performing attention-related computations with extra learnable key and value matrices for the text encoder.
AttnAddedKVProcessor2_0
Processor for performing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0), with extra learnable key and value matrices for the text encoder.
CrossFrameAttnProcessor
class diffusers.pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor
< source >( batch_size = 2 )
Cross frame attention processor. Each frame attends the first frame.
CustomDiffusionAttnProcessor
class diffusers.models.attention_processor.CustomDiffusionAttnProcessor
< source >( train_kv: bool = True train_q_out: bool = True hidden_size: typing.Optional[int] = None cross_attention_dim: typing.Optional[int] = None out_bias: bool = True dropout: float = 0.0 )
Parameters
- train_kv (
bool
, defaults toTrue
) — Whether to newly train the key and value matrices corresponding to the text features. - train_q_out (
bool
, defaults toTrue
) — Whether to newly train query matrices corresponding to the latent image features. - hidden_size (
int
, optional, defaults toNone
) — The hidden size of the attention layer. - cross_attention_dim (
int
, optional, defaults toNone
) — The number of channels in theencoder_hidden_states
. - out_bias (
bool
, defaults toTrue
) — Whether to include the bias parameter intrain_q_out
. - dropout (
float
, optional, defaults to 0.0) — The dropout probability to use.
Processor for implementing attention for the Custom Diffusion method.
CustomDiffusionAttnProcessor2_0
class diffusers.models.attention_processor.CustomDiffusionAttnProcessor2_0
< source >( train_kv: bool = True train_q_out: bool = True hidden_size: typing.Optional[int] = None cross_attention_dim: typing.Optional[int] = None out_bias: bool = True dropout: float = 0.0 )
Parameters
- train_kv (
bool
, defaults toTrue
) — Whether to newly train the key and value matrices corresponding to the text features. - train_q_out (
bool
, defaults toTrue
) — Whether to newly train query matrices corresponding to the latent image features. - hidden_size (
int
, optional, defaults toNone
) — The hidden size of the attention layer. - cross_attention_dim (
int
, optional, defaults toNone
) — The number of channels in theencoder_hidden_states
. - out_bias (
bool
, defaults toTrue
) — Whether to include the bias parameter intrain_q_out
. - dropout (
float
, optional, defaults to 0.0) — The dropout probability to use.
Processor for implementing attention for the Custom Diffusion method using PyTorch 2.0’s memory-efficient scaled dot-product attention.
CustomDiffusionXFormersAttnProcessor
class diffusers.models.attention_processor.CustomDiffusionXFormersAttnProcessor
< source >( train_kv: bool = True train_q_out: bool = False hidden_size: typing.Optional[int] = None cross_attention_dim: typing.Optional[int] = None out_bias: bool = True dropout: float = 0.0 attention_op: typing.Optional[typing.Callable] = None )
Parameters
- train_kv (
bool
, defaults toTrue
) — Whether to newly train the key and value matrices corresponding to the text features. - train_q_out (
bool
, defaults toTrue
) — Whether to newly train query matrices corresponding to the latent image features. - hidden_size (
int
, optional, defaults toNone
) — The hidden size of the attention layer. - cross_attention_dim (
int
, optional, defaults toNone
) — The number of channels in theencoder_hidden_states
. - out_bias (
bool
, defaults toTrue
) — Whether to include the bias parameter intrain_q_out
. - dropout (
float
, optional, defaults to 0.0) — The dropout probability to use. - attention_op (
Callable
, optional, defaults toNone
) — The base operator to use as the attention operator. It is recommended to set toNone
, and allow xFormers to choose the best operator.
Processor for implementing memory efficient attention using xFormers for the Custom Diffusion method.
FusedAttnProcessor2_0
Processor for implementing scaled dot-product attention (enabled by default if you’re using PyTorch 2.0). It uses fused projection layers. For self-attention modules, all projection matrices (i.e., query, key, value) are fused. For cross-attention modules, key and value projection matrices are fused.
This API is currently 🧪 experimental in nature and can change in future.
SlicedAttnProcessor
class diffusers.models.attention_processor.SlicedAttnProcessor
< source >( slice_size: int )
Processor for implementing sliced attention.
SlicedAttnAddedKVProcessor
class diffusers.models.attention_processor.SlicedAttnAddedKVProcessor
< source >( slice_size )
Processor for implementing sliced attention with extra learnable key and value matrices for the text encoder.
XFormersAttnProcessor
class diffusers.models.attention_processor.XFormersAttnProcessor
< source >( attention_op: typing.Optional[typing.Callable] = None )
Parameters
- attention_op (
Callable
, optional, defaults toNone
) — The base operator to use as the attention operator. It is recommended to set toNone
, and allow xFormers to choose the best operator.
Processor for implementing memory efficient attention using xFormers.
AttnProcessorNPU
Processor for implementing flash attention using torch_npu. Torch_npu supports only fp16 and bf16 data types. If fp32 is used, F.scaled_dot_product_attention will be used for computation, but the acceleration effect on NPU is not significant.