good image quality but bad prompt adherence
it's your bad prompt.A vintage photograph from 1970 showing the Pope riding on top of a massive orca whale. The scene is dynamic, with the Pope speeding across the water, creating a sense of motion. The background and surroundings are blurred due to the high speed, enhancing the focus on the Pope and the orca. The Pope is dressed in his traditional white papal garments, and the photo has a slightly faded, nostalgic quality typical of 1970s photography.
nop. it should understand my prompt if prompt adherence is there
dood, adapt your prompt to the model - not the other way around, its always like this, 1.5 and xl need different prompting too, this one as well, so move on and change your prompt, its not going to work with your "pope on top" whatever it means.just cause it works in MJ doesnt mean it will work with anything else
An LLM is needed to upsample your prompt so the model can understand it.
dood, adapt your prompt to the model - not the other way around, its always like this, 1.5 and xl need different prompting too, this one as well, so move on and change your prompt, its not going to work with your "pope on top" whatever it means.just cause it works in MJ doesnt mean it will work with anything else
That shouldn't be the case though. When it is true it means the model, itself, is inherently failing to evolve and improve. The central point around improved prompt coherency is it should eventually reach the level of resolving in the way a human would naturally perceive it. Having to use weird ass negatives like in SD3's fail case shouldn't be the norm.
I have the same experience, great image quality, but prompt adherence is very poor compared to other models. I have been playing around with many inference parameters, and the issue is consistent. I even tried to translate the prompt to Chinese, since the model seems to be optimized for mandarin, but it did not help. Randomly the model can get it, and provide the correct result, so it does seem to understand the nuances in the prompt but the understanding is not consistent. I also tried to use LLM improved prompts as suggested above, but experienced that the model seem to ignore most of the details in the enhanced prompt. For example, given the same seed, two completely different prompts could produce the exact same image, given the presence of some "magic" keywords. So, basically, you will need some luck to get the results you are looking for. Would be nice if the results were more predictable and consistent, but currently it is lacking compared to other models in this respect.