naver
/

trecdl22-crossencoder-rankT53b-repro

Model card Files Files and versions Community

cadurosar commited on Jul 12, 2023

Commit

5f8fbb5

•

1 Parent(s): ff4e33d

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -6,7 +6,8 @@ Our best attempt at reproducing [RankT5 Enc-Softmax](https://arxiv.org/pdf/2210.
 1. We use a SPLADE first stage for the negatives vs GTR on the paper
 2. We train using Pytorch vs Flaxx on the paper
-3. We use the original t5-3b vs Flan T5-3b on the paper
 This leads to what seems to be a slightly worse performance (42.8 vs 43.? on the paper) and seems slightly worse on BEIR as well.

 1. We use a SPLADE first stage for the negatives vs GTR on the paper
 2. We train using Pytorch vs Flaxx on the paper
+3. 	~~We use the original t5-3b vs Flan T5-3b on the paper	~~
+4. The head is not exactly the same, here we add Linear->LayerNorm->Linear and actually make a mistake by not including a nonlinearity. The original paper uses just a dense layer. Fixing this should improve our performance because we have more layers without actually using them correctly
 This leads to what seems to be a slightly worse performance (42.8 vs 43.? on the paper) and seems slightly worse on BEIR as well.