Text Classification
Transformers
PyTorch
Safetensors
xlm-roberta
genre
text-genre
Inference Endpoints
TajaKuzman commited on
Commit
9b8116e
1 Parent(s): ac6c965

Update README

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -114,9 +114,11 @@ widget:
114
 
115
  # X-GENRE classifier - multilingual text genre classifier
116
 
117
- Text classification model based on [`xlm-roberta-base`](https://huggingface.co/xlm-roberta-base) and fine-tuned on a combination of three genre datasets: Slovene [GINCO](http://hdl.handle.net/11356/1467) dataset (Kuzman et al., 2022), the English [CORE](https://github.com/TurkuNLP/CORE-corpus) dataset (Egbert et al., 2015) and the English [FTD](https://github.com/ssharoff/genre-keras) dataset (Sharoff, 2018). The model can be used for automatic genre identification, applied to any text in a language, supported by the `xlm-roberta-base`. The details on the model development, the datasets and the model's in-dataset, cross-dataset and multilingual performance are provided in described in details in the paper [Automatic Genre Identification for Robust Enrichment of Massive Text Collections: Investigation of Classification Methods in the Era of Large Language Models](https://www.mdpi.com/2504-4990/5/3/59) (Kuzman et al., 2023).
118
 
119
- If you use the model, please cite the paper which describes creation of the X-GENRE dataset and the genre classifier:
 
 
120
 
121
  ```
122
  @article{kuzman2023automatic,
@@ -135,7 +137,7 @@ If you use the model, please cite the paper which describes creation of the X-GE
135
 
136
  We set up a benchmark for evaluating robustness of automatic genre identification models to test their usability for the automatic enrichment of large text collections with genre information. You are welcome to request the test dataset and submit your entry at the [benchmark's GitHub repository](https://github.com/TajaKuzman/AGILE-Automatic-Genre-Identification-Benchmark/tree/main).
137
 
138
- In a out-of-dataset scenario (evaluating a model on a manually-annotated English dataset on which it was not trained), the model outperforms all other technologies:
139
 
140
  | | micro F1 | macro F1 | accuracy |
141
  |:----------------------------|-----------:|-----------:|-----------:|
@@ -267,6 +269,8 @@ model_args= {
267
 
268
  ```
269
 
 
 
270
  If you use the model, please cite the paper which describes creation of the X-GENRE dataset and the genre classifier:
271
 
272
  ```
 
114
 
115
  # X-GENRE classifier - multilingual text genre classifier
116
 
117
+ Text classification model based on [`xlm-roberta-base`](https://huggingface.co/xlm-roberta-base) and fine-tuned on a combination of three genre datasets: Slovene [GINCO](http://hdl.handle.net/11356/1467) dataset (Kuzman et al., 2022), the English [CORE](https://github.com/TurkuNLP/CORE-corpus) dataset (Egbert et al., 2015) and the English [FTD](https://github.com/ssharoff/genre-keras) dataset (Sharoff, 2018). The model can be used for automatic genre identification, applied to any text in a language, supported by the `xlm-roberta-base`.
118
 
119
+ The details on the model development, the datasets and the model's in-dataset, cross-dataset and multilingual performance are provided in the paper [Automatic Genre Identification for Robust Enrichment of Massive Text Collections: Investigation of Classification Methods in the Era of Large Language Models](https://www.mdpi.com/2504-4990/5/3/59) (Kuzman et al., 2023).
120
+
121
+ If you use the model, please cite the paper:
122
 
123
  ```
124
  @article{kuzman2023automatic,
 
137
 
138
  We set up a benchmark for evaluating robustness of automatic genre identification models to test their usability for the automatic enrichment of large text collections with genre information. You are welcome to request the test dataset and submit your entry at the [benchmark's GitHub repository](https://github.com/TajaKuzman/AGILE-Automatic-Genre-Identification-Benchmark/tree/main).
139
 
140
+ In an out-of-dataset scenario (evaluating a model on a manually-annotated English dataset on which it was not trained), the model outperforms all other technologies:
141
 
142
  | | micro F1 | macro F1 | accuracy |
143
  |:----------------------------|-----------:|-----------:|-----------:|
 
269
 
270
  ```
271
 
272
+ ## Citation
273
+
274
  If you use the model, please cite the paper which describes creation of the X-GENRE dataset and the genre classifier:
275
 
276
  ```