Problems due to the model having no predefined "max_length".

by singlewaver - opened Feb 22

Feb 22

representation_model = ZeroShotClassification(candidate_topics,model="BSC-LT/sciroshot")
topic_model = BERTopic(verbose=True,representation_model=representation_model)
topics, probabilities = topic_model.fit_transform(abstracts)
traceback：Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

nicolauduran45

11 days ago

Hi @singlewaver ,
try to fix the model_max_length of the tokenizer when you load it

from transformers import pipeline

model_path = "BSC-LT/sciroshot"
model = AutoModelForSequenceClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.model_max_length = 512 #the important part

sciroshot_classifier = pipeline("zero-shot-classification",
                      model=model, tokenizer=tokenizer, device = 0, truncation=True, max_length = 512)

I hope this can help you!

mapama247

Language Technologies Unit @ Barcelona Supercomputing Center org 10 days ago

Hi! Sorry, we missed this issue :(

I've just added the model_max_length in the config file, so from now on there's no need to specify it after loading the tokenizer.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment