Multimodal Image-to-Video
Erase any object just by naming it!
Generate spatial audio from images (and optionally text)
text-to-3D & image-to-3D
Media understanding