microsoft/llmlingua-2 · Fix error when force_tokens includes multi-word sequence to preserve

Oct 16

•

Right now an error occurs when force_tokens includes a sequence which contains spaces and is contained in the prompt.
This is caused by word, label = line.split(label_sep), where line is for example The answer is 1 (where "The answer is" is the sequence in force_tokens and 1 is the corresponding label).
The error is thrown because line.split(label_sep) returns ['The', 'answer', 'is', '1'], which is too many arguments to be unpacked into word, label.

The fix is to split only at the first occurence of label_sep from the right.

Fix error when force_tokens includes multi-word sequence to preservec07e7a4b

qianhuiwu changed pull request status to merged 13 days ago

qianhuiwu

Microsoft org 13 days ago

Thanks. Merge the fix for string split.