forgeml/viton_hd
Viewer
•
Updated
•
11.6k
•
128
•
8
A collection of useful datasets for training diffusion models.
Note 11.6k images with cloth - human pairs including masks and poses.
Note Large scale synthetic image prompt pairs. Generated using Stable Diffusion
Note 2.8M real world image - text pairs. Text is *not* from a VLM or a captioning model though
Note Text to 3D dataset filtered from Objectverse. Viewer doesn't work but has 365k pairs iirc.
Note A collection of diverse prompts across multiple domains.