arxiv:2312.16204

Iterative Prompt Relabeling for diffusion model with RLDF

Published on Dec 23, 2023

Authors:

Tianjun Zhang ,

Abstract

Diffusion models have shown impressive performance in many domains, including image generation, time series prediction, and reinforcement learning. The algorithm demonstrates superior performance over the traditional GAN and transformer based methods. However, the model's capability to follow natural language instructions (e.g., spatial relationships between objects, generating complex scenes) is still unsatisfactory. This has been an important research area to enhance such capability. Prior works adopt reinforcement learning to adjust the behavior of the diffusion models. However, RL methods not only require careful reward design and complex hyperparameter tuning, but also fails to incorporate rich natural language feedback. In this work, we propose iterative prompt relabeling (IP-RLDF), a novel algorithm that aligns images to text through iterative image sampling and prompt relabeling. IP-RLDF first samples a batch of images conditioned on the text, then relabels the text prompts of unmatched text-image pairs with classifier feedback. We conduct thorough experiments on three different models, including SDv2, GLIGEN, and SDXL, testing their capability to generate images following instructions. With IP-RLDF, we improved up to 15.22% (absolute improvement) on the challenging spatial relation VISOR benchmark, demonstrating superior performance compared to previous RL methods.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2312.16204 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2312.16204 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2312.16204 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.