nicolauduran45
commited on
Commit
•
3b4f543
1
Parent(s):
5edc3dd
Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ models-in particular, in low-resource situations. Considering the fact that the
|
|
13 |
which is different from the one that would be expected to be found in free natural language, we explore whether our affiliation span identification and
|
14 |
NER models would benefit from being fine-tuned from models that have been *further pre-trained* on raw affiliation strings for the masked token prediction task.
|
15 |
|
16 |
-
We
|
17 |
In what follows, we refer to our adapted models as AffilRoBERTa (adapted RoBERTa model) and AffilXLM (adapted XLM-RoBERTa).
|
18 |
|
19 |
Specific details of the adaptive pre-training procedure can be found in [Duran-Silva *et al.* (2024)](https://aclanthology.org/2024.sdp-1.13.pdf).
|
|
|
13 |
which is different from the one that would be expected to be found in free natural language, we explore whether our affiliation span identification and
|
14 |
NER models would benefit from being fine-tuned from models that have been *further pre-trained* on raw affiliation strings for the masked token prediction task.
|
15 |
|
16 |
+
We adapt models to 10 million random raw affiliation strings from OpenAlex, reporting perplexity on 50k randomly held-out affiliation strings.
|
17 |
In what follows, we refer to our adapted models as AffilRoBERTa (adapted RoBERTa model) and AffilXLM (adapted XLM-RoBERTa).
|
18 |
|
19 |
Specific details of the adaptive pre-training procedure can be found in [Duran-Silva *et al.* (2024)](https://aclanthology.org/2024.sdp-1.13.pdf).
|