eval results reproducibility
#5
by
diana-onutu
- opened
How did you deal with the misalignment that appears after tokenization between the tokens and the ner tags? If the word "Japan" has as ner tag "B-LOC", how does it look like after it is tokenized as follows: "JA", "#PA", "#N"? Do you for example re-align the ner tags as "B-LOC", "I-LOC", "I-LOC"? I'm trying to reproduce your evaluation results, but most of them are between 0.5-0.7 (except accuracy). In the calculation of these metrics, do we also evaluate the performance on the "O" label?