TajaKuzman
commited on
Commit
•
9b8116e
1
Parent(s):
ac6c965
Update README
Browse files
README.md
CHANGED
@@ -114,9 +114,11 @@ widget:
|
|
114 |
|
115 |
# X-GENRE classifier - multilingual text genre classifier
|
116 |
|
117 |
-
Text classification model based on [`xlm-roberta-base`](https://huggingface.co/xlm-roberta-base) and fine-tuned on a combination of three genre datasets: Slovene [GINCO](http://hdl.handle.net/11356/1467) dataset (Kuzman et al., 2022), the English [CORE](https://github.com/TurkuNLP/CORE-corpus) dataset (Egbert et al., 2015) and the English [FTD](https://github.com/ssharoff/genre-keras) dataset (Sharoff, 2018). The model can be used for automatic genre identification, applied to any text in a language, supported by the `xlm-roberta-base`.
|
118 |
|
119 |
-
|
|
|
|
|
120 |
|
121 |
```
|
122 |
@article{kuzman2023automatic,
|
@@ -135,7 +137,7 @@ If you use the model, please cite the paper which describes creation of the X-GE
|
|
135 |
|
136 |
We set up a benchmark for evaluating robustness of automatic genre identification models to test their usability for the automatic enrichment of large text collections with genre information. You are welcome to request the test dataset and submit your entry at the [benchmark's GitHub repository](https://github.com/TajaKuzman/AGILE-Automatic-Genre-Identification-Benchmark/tree/main).
|
137 |
|
138 |
-
In
|
139 |
|
140 |
| | micro F1 | macro F1 | accuracy |
|
141 |
|:----------------------------|-----------:|-----------:|-----------:|
|
@@ -267,6 +269,8 @@ model_args= {
|
|
267 |
|
268 |
```
|
269 |
|
|
|
|
|
270 |
If you use the model, please cite the paper which describes creation of the X-GENRE dataset and the genre classifier:
|
271 |
|
272 |
```
|
|
|
114 |
|
115 |
# X-GENRE classifier - multilingual text genre classifier
|
116 |
|
117 |
+
Text classification model based on [`xlm-roberta-base`](https://huggingface.co/xlm-roberta-base) and fine-tuned on a combination of three genre datasets: Slovene [GINCO](http://hdl.handle.net/11356/1467) dataset (Kuzman et al., 2022), the English [CORE](https://github.com/TurkuNLP/CORE-corpus) dataset (Egbert et al., 2015) and the English [FTD](https://github.com/ssharoff/genre-keras) dataset (Sharoff, 2018). The model can be used for automatic genre identification, applied to any text in a language, supported by the `xlm-roberta-base`.
|
118 |
|
119 |
+
The details on the model development, the datasets and the model's in-dataset, cross-dataset and multilingual performance are provided in the paper [Automatic Genre Identification for Robust Enrichment of Massive Text Collections: Investigation of Classification Methods in the Era of Large Language Models](https://www.mdpi.com/2504-4990/5/3/59) (Kuzman et al., 2023).
|
120 |
+
|
121 |
+
If you use the model, please cite the paper:
|
122 |
|
123 |
```
|
124 |
@article{kuzman2023automatic,
|
|
|
137 |
|
138 |
We set up a benchmark for evaluating robustness of automatic genre identification models to test their usability for the automatic enrichment of large text collections with genre information. You are welcome to request the test dataset and submit your entry at the [benchmark's GitHub repository](https://github.com/TajaKuzman/AGILE-Automatic-Genre-Identification-Benchmark/tree/main).
|
139 |
|
140 |
+
In an out-of-dataset scenario (evaluating a model on a manually-annotated English dataset on which it was not trained), the model outperforms all other technologies:
|
141 |
|
142 |
| | micro F1 | macro F1 | accuracy |
|
143 |
|:----------------------------|-----------:|-----------:|-----------:|
|
|
|
269 |
|
270 |
```
|
271 |
|
272 |
+
## Citation
|
273 |
+
|
274 |
If you use the model, please cite the paper which describes creation of the X-GENRE dataset and the genre classifier:
|
275 |
|
276 |
```
|