Multilingual classification model to detect texts from the political science domain

Accuracy: 0.978

Predicts 2 classes:

class	description	precision	recall	f1-score	support
politics	political science	0.975	0.978	0.976	2143
multi	other scientific domains	0.981	0.979	0.980	2583

Evaluation by class and language:

class	description	language	precision	recall	f1-score	support
politics	political science	English	0,989	0,993	0,991	1212
multi	other scientific domains	English	0,992	0,989	0,991	1164
politics	political science	German	0,952	0,958	0,955	783
multi	other scientific domains	German	0,957	0,951	0,954	776
politics	political science	French	0,979	0,959	0,969	148
multi	other scientific domains	French	0,991	0,995	0,993	643

Based on BERT multilingual base model (uncased)

This model is a multilingual version of our SSciBERT_politics. The model was fine-tuned using a dataset of 14,178 abstracts from scientific articles retrieved from the BASE and POLLUX collections of scientific articles. Abstracts from scientific articles in 3 languages (English, German and French) were used for the training. The BASE data were labelled as "politics" or "multi" according to the Dewey Decimal Classification (DDC). Data from several major political science journals in the POLLUX dataset were marked as "politics" class.

Usage

Requires: transformers (pip install transformers)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained('kalawinka/bert-base-ml-politics')
model = AutoModelForSequenceClassification.from_pretrained('kalawinka/bert-base-ml-politics')
pipe = pipeline("text-classification", model=model, tokenizer = tokenizer, max_length=512, truncation=True)

pipe("""Verschiedene Arten der Art und Weise: zu ihrer Positionierung im Deutschen und Englischen Ausgehend von der Annahme, daß die Stellung der Adverbiale ihre semantischen Relationen zum Rest des Satzes widerspiegelt, wird gezeigt, daß die traditionelle Klasse der Adverbiale der Art und Weise in verschiedene Klassen zerfällt; in die bei dem (finalen) Verb stehenden prozeßbezogenen Adverbiale, andererseits in die subjektbezogenen und ereignisbezogenen Adverbiale, die höher im Satz stehen. Adverbiale der "Art und Weise" dieser unterschiedlichen Gruppen zeigen nicht nur im Deutschen, sondern auch im Englischen und Französischen ein unterschiedliches Stellungsverhalten. Unterschiede, die sich zwischen diesen Sprachen hinsichtlich der Stellung dieser Adverbien beobachten lassen, sind auf Unterschiede in den Satzstrukturen zurückzuführen. Proceeding from the assumption that the positions of adverbials reflect their semantic relations to the rest of the sentence, it is shown that the traditional class of manner adverbs can be divided into several classes: on the one hand there are process-related adverbs which are closely related to (final) verbs, on the other subject-oriented and event-related adverbs occurring higher in the sentence. "Manner adverbs" of these different groups can occupy different positions in German and English as well as in French. It can be argued that differences in adverb positions between these languages are the result of different sentence structures.""")

This produces the following output:

[{'label': 'multi', 'score': 0.9998677968978882}]