Problems due to the model having no predefined "max_length".
#3
by
singlewaver
- opened
representation_model = ZeroShotClassification(candidate_topics,model="BSC-LT/sciroshot")
topic_model = BERTopic(verbose=True,representation_model=representation_model)
topics, probabilities = topic_model.fit_transform(abstracts)
traceback:Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Hi
@singlewaver
,
try to fix the model_max_length of the tokenizer when you load it
from transformers import pipeline
model_path = "BSC-LT/sciroshot"
model = AutoModelForSequenceClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.model_max_length = 512 #the important part
sciroshot_classifier = pipeline("zero-shot-classification",
model=model, tokenizer=tokenizer, device = 0, truncation=True, max_length = 512)
I hope this can help you!
Hi! Sorry, we missed this issue :(
I've just added the model_max_length in the config file, so from now on there's no need to specify it after loading the tokenizer.