Using multiple prompts during inference
#28
by
ksooklall
- opened
I understand we can use many prompts like prompt = "<OD>"
and prompt = "<CAPTION>"
but is there a way to get both prompts? I don't want to run inference twice. Something like
prompt = "<OD_CAPTION>"
inputs = processor(text=prompt, images=image, return_tensors="pt").to(device, torch_dtype)
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"])
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
and the result would be:
result: {'<OD>': {'bboxes': [[x1, y1, x2, y2], ...], 'labels': ['label1', 'label2', ...]}, '<CAPTION>': A green car parked in front of a yellow building."}