Diffusers documentation

AllegroTransformer3DModel

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.31.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

AllegroTransformer3DModel

A Diffusion Transformer model for 3D data from Allegro was introduced in Allegro: Open the Black Box of Commercial-Level Video Generation Model by RhymesAI.

The model can be loaded with the following code snippet.

from diffusers import AllegroTransformer3DModel

vae = AllegroTransformer3DModel.from_pretrained("rhymes-ai/Allegro", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")

AllegroTransformer3DModel

class diffusers.AllegroTransformer3DModel

< >

( patch_size: int = 2 patch_size_t: int = 1 num_attention_heads: int = 24 attention_head_dim: int = 96 in_channels: int = 4 out_channels: int = 4 num_layers: int = 32 dropout: float = 0.0 cross_attention_dim: int = 2304 attention_bias: bool = True sample_height: int = 90 sample_width: int = 160 sample_frames: int = 22 activation_fn: str = 'gelu-approximate' norm_elementwise_affine: bool = False norm_eps: float = 1e-06 caption_channels: int = 4096 interpolation_scale_h: float = 2.0 interpolation_scale_w: float = 2.0 interpolation_scale_t: float = 2.2 )

Transformer2DModelOutput

class diffusers.models.modeling_outputs.Transformer2DModelOutput

< >

( sample: torch.Tensor )

Parameters

  • sample (torch.Tensor of shape (batch_size, num_channels, height, width) or (batch size, num_vector_embeds - 1, num_latent_pixels) if Transformer2DModel is discrete) — The hidden states output conditioned on the encoder_hidden_states input. If discrete, returns probability distributions for the unnoised latent pixels.

The output of Transformer2DModel.

< > Update on GitHub