good image quality but bad prompt adherence

#7
by balikita - opened

prompt: photo of the pope on top of an orca whale. he is speeding and the photo around him is blurry. year 1970

result:

image.png

it's your bad prompt.
A vintage photograph from 1970 showing the Pope riding on top of a massive orca whale. The scene is dynamic, with the Pope speeding across the water, creating a sense of motion. The background and surroundings are blurred due to the high speed, enhancing the focus on the Pope and the orca. The Pope is dressed in his traditional white papal garments, and the photo has a slightly faded, nostalgic quality typical of 1970s photography.
image.png

nop. it should understand my prompt if prompt adherence is there

@balikita why it should while your prompt is not clear and even wrong?

@balikita why it should while your prompt is not clear and even wrong?

because that prompt working good in midjourney so that's not wrong

dood, adapt your prompt to the model - not the other way around, its always like this, 1.5 and xl need different prompting too, this one as well, so move on and change your prompt, its not going to work with your "pope on top" whatever it means.just cause it works in MJ doesnt mean it will work with anything else

An LLM is needed to upsample your prompt so the model can understand it.

dood, adapt your prompt to the model - not the other way around, its always like this, 1.5 and xl need different prompting too, this one as well, so move on and change your prompt, its not going to work with your "pope on top" whatever it means.just cause it works in MJ doesnt mean it will work with anything else

That shouldn't be the case though. When it is true it means the model, itself, is inherently failing to evolve and improve. The central point around improved prompt coherency is it should eventually reach the level of resolving in the way a human would naturally perceive it. Having to use weird ass negatives like in SD3's fail case shouldn't be the norm.

Modified to "riding on top", and it works, in some cases:
photo of the pope riding on top of an orca whale. he is speeding and the photo around him is blurry. year 1970

Test_00144_.png

Not totally agree with balikita, but it's true that Kolors' prompt adherence is not quite good, after lots of tests.

I have the same experience, great image quality, but prompt adherence is very poor compared to other models. I have been playing around with many inference parameters, and the issue is consistent. I even tried to translate the prompt to Chinese, since the model seems to be optimized for mandarin, but it did not help. Randomly the model can get it, and provide the correct result, so it does seem to understand the nuances in the prompt but the understanding is not consistent. I also tried to use LLM improved prompts as suggested above, but experienced that the model seem to ignore most of the details in the enhanced prompt. For example, given the same seed, two completely different prompts could produce the exact same image, given the presence of some "magic" keywords. So, basically, you will need some luck to get the results you are looking for. Would be nice if the results were more predictable and consistent, but currently it is lacking compared to other models in this respect.

This comment has been hidden

Sign up or log in to comment