Results and training data

by msperka - opened Jan 3

Jan 3

Thanks for publicizing this.!

In the paper (table 2) you reported a nice F1 score of 0.8701, and it was also mentioned that the training was done on NEMO corpus.
was there any changes in this since the paper publication? (Im asking because i was on whatsapp giving credit to NEMO for providing over 90% of the training data - is there more that you are able to share?)

Shaltiel

DICTA: The Israel Center for Text Analysis org Jan 3

The reported scores in the paper were for a model trained and tested solely on the NEMO corpus.
For the training of this model we trained it on a much larger corpus, where NEMO was actually a very small percentage of it, and most of the training data was provided by the IAHLT project.

Shaltiel changed discussion status to closed Jan 3

Shaltiel changed discussion status to open Jan 3

msperka

Jan 3

may i ask about the F1 results now?

Shaltiel

DICTA: The Israel Center for Text Analysis org Jan 3

Results are similar but harder to estimate, since the IAHLT corpus includes additional tags which aren't included in the NEMO corpus.
We are going to release a detailed document with experiments in the coming weeks. On a much larger test corpus (a subset of the IAHLT corpus) with more domains the overall F1 reaches 0.84.

Shaltiel

DICTA: The Israel Center for Text Analysis org Jan 3

(to contrast, the model trained on NEMO alone does significantly worse on this test corpus.)

msperka

Jan 3

Thank You!

msperka changed discussion status to closed Jan 3

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment