Does this model use all the 768*num_tokens vectors or just the [CLS] one?

#2
by raquelhortab - opened

Hi, [beware I am new to using transformers]
I am learning to use transformers and I am dealing with a text classification task. I have been suggested to only use the [CLS] token vector (I understand it represents the whole sentence) since it is quicker than using the whole 768*num_tokens vectors. I was wondering id this models does that.

Also, if you don't mind my "newbieness", if I directly evaluate the model (model.evaluate(tokenized_data)) I get around a 80% accuracy but if I try to fine-tune it using the fit function it goes down to 50%. I created a detailed question on StackOverflow in case someone here can help: https://stackoverflow.com/questions/72704214/transformers-pretrained-models-accuracy-decreases-after-fine-tuning

Hey @raquelhortab ,

Good question! This model has indeed been fine-tuned using the CLS token which you can see here: https://github.com/huggingface/transformers/blob/cc5c061e346365252458126abb699b87cda5dcc0/src/transformers/models/distilbert/modeling_distilbert.py#L759

It only uses the first token - the CLS token as the "pooled output".

Regarding your second question, I don't really know - I'd recommend using the forum to get an answer for this :-) https://discuss.huggingface.co/

Sign up or log in to comment