Qwen/Qwen-VL · Poor performance using huggingface qwenVL not chat

Hello, I am using qwenvl(not chat) to infer, but I found the perfomance of the model is very poor, which is a very big contract with the report. I am guessing there exists some problem with my prompt, but I can't find.

My code is

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL", device_map="cuda", trust_remote_code=True).eval()

prompt = 'If your privacy was suddenly put at risk, would you instinctively opt for the 'Private' button or succumb to the pressure of social approval by pressing 'Public'?'
query = tokenizer.from_list_format([
        {'image': image_pth},
        {'text': prompt + ' Answer is : '},
])
inputs = tokenizer(query, return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
response = tokenizer.decode(pred[0][inputs['input_ids'].shape[1]:].cpu(), skip_special_tokens=True)

The response from the model is only "50-50". I am considering whether I should append something to the prompt to improve the clarity of the response. However, when I reviewed the evaluation code on GitHub, it seems that no additional prompt was added. Am I missing something?

Thank you for your assistance.