[CLS] for donwstream tasks.
#4
by
jinyuan22
- opened
I noticed a [CLS] token was add to the sequence. Was it used for training? Can I use it as a feature extraction for downstream tasks?
Hi jinyuan22,
In the paper, we use the mean embedding over all the tokens embedding, CLS excluded, as feature extraction for downstream tasks. You can use the CLS token embedding instead and you will probably obtain good performance too but you might not find the exact same results than in the paper if you use this approach.
Hope this helps!