Gerwin
/

legal-bert-dutch-english

@@ -13,7 +13,7 @@ metrics:
 ---
 # Legal BERT model applicable for Dutch and English
-A BERT model further trained from [mBERT](https://huggingface.co/bert-base-multilingual-uncased) on legal documents. The thesis can be downloaded [here](https://www.ru.nl/publish/pages/769526/gerwin_de_kruijf.pdf)
 ## Data
 The model is further trained the same way as [EurlexBERT](https://huggingface.co/nlpaueb/bert-base-uncased-eurlex): regulations, decisions, directives, and parliamentary questions were acquired in both Dutch and English. A total of 184k documents, around 295M words, was used to further train the model. This is less than 9% the size of the original BERT model.
@@ -24,11 +24,11 @@ Further training was done for 60k steps, since it showed better results compared
 from transformers import AutoTokenizer, AutoModel, TFAutoModel
 tokenizer = AutoTokenizer.from_pretrained("Gerwin/legal-bert-dutch-english")
 model = AutoModel.from_pretrained("Gerwin/legal-bert-dutch-english")  # PyTorch
-model = TFAutoModel.from_pretrained("Gerwin/legal-bert-dutch-english")  # Tensorflow
 ```
 ## Benchmarks
-The thesis lists various benchmarks. Here are a couple of comparisons between popular BERT models and this model. The fine-tuning procedures for these benchmarks are identical for each pre-trained model, and are more explained in the thesis. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures. The table shows the weighted F1-scores.
 ### Legal topic classification
 | Model                                                                         | [Multi-EURLEX (NL)](https://huggingface.co/datasets/multi_eurlex) |

 ---
 # Legal BERT model applicable for Dutch and English
+A BERT model further trained from [mBERT](https://huggingface.co/bert-base-multilingual-uncased) on legal documents. The thesis can be downloaded [here](https://www.ru.nl/publish/pages/769526/gerwin_de_kruijf.pdf).
 ## Data
 The model is further trained the same way as [EurlexBERT](https://huggingface.co/nlpaueb/bert-base-uncased-eurlex): regulations, decisions, directives, and parliamentary questions were acquired in both Dutch and English. A total of 184k documents, around 295M words, was used to further train the model. This is less than 9% the size of the original BERT model.
 from transformers import AutoTokenizer, AutoModel, TFAutoModel
 tokenizer = AutoTokenizer.from_pretrained("Gerwin/legal-bert-dutch-english")
 model = AutoModel.from_pretrained("Gerwin/legal-bert-dutch-english")  # PyTorch
+model = TFAutoModel.from_pretrained("Gerwin/legal-bert-dutch-english")  # TensorFlow
 ```
 ## Benchmarks
+The thesis lists various benchmarks. Here are a couple of comparisons between popular BERT models and this model. The fine-tuning procedures for these benchmarks are identical for each pre-trained model, and are more explained in the thesis. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures. The table shows the weighted F1 scores.
 ### Legal topic classification
 | Model                                                                         | [Multi-EURLEX (NL)](https://huggingface.co/datasets/multi_eurlex) |