Gerwin commited on
Commit
543320d
1 Parent(s): c6b99ba

update tags

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -5,8 +5,6 @@ language:
5
  tags:
6
  - bert
7
  - legal
8
- - dutch
9
- - english
10
  license: apache-2.0
11
  metrics:
12
  - F1
@@ -28,7 +26,7 @@ model = TFAutoModel.from_pretrained("Gerwin/legal-bert-dutch-english") # Tensor
28
  ```
29
 
30
  ## Benchmarks
31
- The thesis lists various benchmarks. Here are a couple of comparisons between popular BERT models and this model. The fine-tuning procedures for these benchmarks are identical for each pre-trained model, and are more explained in the thesis. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures. The table shows the weighted F1 scores.
32
 
33
  ### Legal topic classification
34
  | Model | [Multi-EURLEX (NL)](https://huggingface.co/datasets/multi_eurlex) |
@@ -47,7 +45,7 @@ The thesis lists various benchmarks. Here are a couple of comparisons between po
47
 
48
 
49
  ### Multi-class classification (Rabobank)
50
- This dataset is not open-source, but it is still an interesting case since the dataset contains both Dutch and English long legal documents that have to be classified. The dataset only consisted of 8000 documents (2000 Dutch & 6000 English) with a total of 30 classes. Using a combined architecture of a Dutch and English BERT model was not beneficial, since documents from both languages could belong to the same class.
51
 
52
  | Model | Rabobank |
53
  | ---------------------------------- | ---------------------------------- |
 
5
  tags:
6
  - bert
7
  - legal
 
 
8
  license: apache-2.0
9
  metrics:
10
  - F1
 
26
  ```
27
 
28
  ## Benchmarks
29
+ Here are a couple of comparisons between popular BERT models and this model. The fine-tuning procedures for these benchmarks are identical for each pre-trained model, and are more explained in the thesis. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures. The table shows the weighted F1 scores.
30
 
31
  ### Legal topic classification
32
  | Model | [Multi-EURLEX (NL)](https://huggingface.co/datasets/multi_eurlex) |
 
45
 
46
 
47
  ### Multi-class classification (Rabobank)
48
+ This dataset is not open-source, but it is still an interesting case since the dataset contains both Dutch and English legal documents that have to be classified. The dataset consists of 8000 long legal documents (2000 Dutch & 6000 English) with a total of 30 classes. Using a combined architecture of a Dutch and English BERT model was not beneficial, since documents from both languages could belong to the same class.
49
 
50
  | Model | Rabobank |
51
  | ---------------------------------- | ---------------------------------- |