Datasets used in the fine-tuning stage
#16
by
yushi
- opened
Hi authors, thanks for the great work! Can you release the scope of the data used in the fine-tuning stage? Specifically, which tasks in the MTEB benchmark is included in the training data?
The data used in the fine-tuning stage is basically the same as that introduced in the paper[https://arxiv.org/abs/2308.03281]. GTE version v1.5 adds some synthetic data generated by LLM, which is not included in the MTEB benchmarks.