IndexError: index out of range in self
I get this error when using the example code. The last line in the stack trace is this:
Lib\site-packages\torch\nn\functional.py", line 2267, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
The only thing I changed is that I used a longer input text. I think it is too long. How to fix? Can I set the maximum length somehow?
I was facing the same issue; I solved it by slicing the input into two pieces, summarizing each of them in larger texts, merging both of them, and then summarizing one more time. The problem is that I think a lot of information was lost.
That's right. I usually break my input text into chunks of 500 tokens to resolve this.
def chunk_text_with_context(text, context, max_tokens=500):
words = text.split()
chunks = []
current_chunk = [context]
current_length = len(tokenizer.encode(context, add_special_tokens=False))
for word in words:
word_length = len(tokenizer.encode(word, add_special_tokens=False))
if current_length + word_length <= max_tokens:
current_chunk.append(word)
current_length += word_length
else:
chunks.append(" ".join(current_chunk))
current_chunk = [context, word]
current_length = len(tokenizer.encode(context, add_special_tokens=False)) + word_length
# Add the last chunk if there's any
if current_chunk:
chunks.append(" ".join(current_chunk))
return chunks
The above is code for if you want to append a certain context to each chunk.