`TypeError` When Processing Text and Image Batch with `processor`
Issue Summary
I'm encountering a TypeError
when trying to process a batch of data using the processor
in my code. The error occurs when I attempt to process both text and image data together in a batch.
Code Snippet
texts = []
images = []
for example in examples:
prompt = "my prompt"
placeholder = f"<|image_{1}|>\n"
messages = [
{"role": "user", "content": placeholder + prompt},
]
text = processor.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=False
)
texts.append(text)
images.append(example["image"])
batch = processor(texts, images, return_tensors="pt", padding=True)
return _compile(pattern, flags).split(string, maxsplit)
TypeError: expected string or bytes-like object
Detailed Description
The error occurs when trying to create a batch using the processor by passing a list of texts and images. The apply_chat_template function seems to return a result that causes the processor to raise a TypeError related to expecting a string or bytes-like object. It appears the issue is triggered when handling the batch processing of both text and image data together.
Expected Behavior
The processor should correctly handle the provided text and image inputs and return the appropriate tensors without raising an error.
Environment
Python Version: 3.9
OS: ubuntu
have the same issue. My use case is to have a single question for a single image, but I want it in batched mode to increase throughput.
hi, thanks for your interest. Batch-mode is not supported in the processor.
I got batched inference with text + images to work π¦
One has to tokenize each image+prompt pair individually into a tensor, and then stack & pad these tensors into a large tensor to feed the model.
Here is the code to get going now π: https://gist.github.com/tomasruizt/21cfd764f8d89a7802bf32537af55bbe
I tested that each image does not leak into the other prompts by permuting the prompts about each image, and evaluating the answers qualitatively. If you find any errors, please let me know.