how to perform batch inference for llama-3.2-1B-Instruct

#28

by helmoz - opened 4 days ago

4 days ago

When using the model for inference, I tried inputting batch data, but after generation, I noticed that some data in the batch was generated incorrectly. For certain prompts, a long string of blank spaces is generated first, followed by the answer, while others do not start with "assistant" and directly generate the answer. Does the model effectively support parallel processing of data? What is the correct way to perform batch inference?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment