Alibaba-NLP/gte-Qwen2-7B-instruct · Pooling method: mean vs last?

alexzhou689

Aug 6

Same to title, which one should i choose for inference or training?

thenlper

Alibaba-NLP org Aug 14

recommending to use the last token pooling method, please refer to the example code in the model introduction.

gu-qizheng

Oct 1

I noted that in the original GTE paper "Towards General Text Embeddings with Multi-stage Contrastive Learning" Section 3.1 Model Architecture, mean pooling is used. However in gte-Qwen2-7B-instruct, last token pooling is used, as is shown in the example code and config file. I wonder is there any literature reference or experience could be shared on the design choice of the pooling method? It looks like bidirectional embedding models typically use mean pooling (as is the case in the original GTE paper with BERT), while the last token embedding is more common for decoder-only LLM based models.