Token size limit
Hello, I would like to ask which is the size limit of the prompt token in sd3. Is it the 2 x 77 or I misunderstood?Thanks in advance.
For now is 77, this is for the three text encoders. There's a PR for only the T5 to be higher which can be as high as 512 but for the clip ones it will still be 77.
The real
tokens are 75, the other two are for bos
and eos
. Also the 2 x 77 means that each clip model uses 77 tokens and since they're two this means 2 x 77.
Because the example prompts has more than 77 tokens, I previously modified diffusers to support T5 512 long token.
But unfortunately this space is rarely used by anyone πmood.
https://huggingface.co/spaces/vilarin/sd3m-long
this almost works:
from compel import Compel, ReturnedEmbeddingsType
compel = Compel(
truncate_long_prompts=False,
tokenizer=[
pipeline.tokenizer,
pipeline.tokenizer_2
],
text_encoder=[
pipeline.text_encoder,
pipeline.text_encoder_2
],
returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
requires_pooled=[
False,
True
]
)
conditioning, pooled = compel(prompt)
negative_embed, negative_pooled = compel(negative_prompt)
[conditioning, negative_embed] = compel.pad_conditioning_tensors_to_same_length(
[conditioning, negative_embed])
pipe = pipeline(output_type='pil', num_inference_steps=num_inference_steps, num_images_per_prompt=num_images_per_prompt, width=512, height=512,
prompt_embeds=conditioning, pooled_prompt_embeds=pooled, negative_prompt_embeds=negative_embed, negative_pooled_prompt_embeds=negative_pooled).images
For now is 77, this is for the three text encoders. There's a PR for only the T5 to be higher which can be as high as 512 but for the clip ones it will still be 77.
How can I use the T5? Is there an example on how to do that?
*Edit using both prompt and prompt_3 (T5):
image = pipe(
prompt=prompt,
prompt_3=prompt_3,
negative_prompt="",
num_inference_steps=28,
guidance_scale=4.5,
max_sequence_length=512,
).images[0]
the documentation for this its still in the main branch so until the next release, this is the link.
If you want to use it with low VRAM there's documentation about it too.