Problems due to the model having no predefined "max_length".

#3
by singlewaver - opened

representation_model = ZeroShotClassification(candidate_topics,model="BSC-LT/sciroshot")
topic_model = BERTopic(verbose=True,representation_model=representation_model)
topics, probabilities = topic_model.fit_transform(abstracts)
traceback:Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

Hi @singlewaver ,
try to fix the model_max_length of the tokenizer when you load it

from transformers import pipeline

model_path = "BSC-LT/sciroshot"
model = AutoModelForSequenceClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.model_max_length = 512 #the important part

sciroshot_classifier = pipeline("zero-shot-classification",
                      model=model, tokenizer=tokenizer, device = 0, truncation=True, max_length = 512)

I hope this can help you!

Language Technologies Unit @ Barcelona Supercomputing Center org

Hi! Sorry, we missed this issue :(

I've just added the model_max_length in the config file, so from now on there's no need to specify it after loading the tokenizer.

Sign up or log in to comment