how to perform batch inference for llama-3.2-1B-Instruct

#28
by helmoz - opened

When using the model for inference, I tried inputting batch data, but after generation, I noticed that some data in the batch was generated incorrectly. For certain prompts, a long string of blank spaces is generated first, followed by the answer, while others do not start with "assistant" and directly generate the answer. Does the model effectively support parallel processing of data? What is the correct way to perform batch inference?

Sign up or log in to comment