ANTOUN Wissam
commited on
Commit
•
f460d39
1
Parent(s):
109da43
added model files
Browse files- .gitattributes +1 -0
- README.md +94 -0
- added_tokens.json +1 -0
- ckpt-3400.ckpt-137.data-00000-of-00001 +3 -0
- ckpt-3400.ckpt-137.index +3 -0
- config.json +40 -0
- pytorch_model.bin +3 -0
- runs/.gitkeep +0 -0
- runs/eval/events.out.tfevents.1658662619.nefgpu54.32602.453.v2 +3 -0
- runs/eval/events.out.tfevents.1658780993.nefgpu54.28912.453.v2 +3 -0
- runs/eval/events.out.tfevents.1658782266.nefgpu54.32401.453.v2 +3 -0
- runs/eval/events.out.tfevents.1658814681.nefgpu54.80450.453.v2 +3 -0
- runs/eval/events.out.tfevents.1658822505.nefgpu54.33886.453.v2 +3 -0
- runs/eval/events.out.tfevents.1658978757.nefgpu54.33349.455.v2 +3 -0
- runs/eval/events.out.tfevents.1659006537.nefgpu54.32289.455.v2 +3 -0
- runs/eval/events.out.tfevents.1659006973.nefgpu54.34472.455.v2 +3 -0
- runs/eval/events.out.tfevents.1659133070.nefgpu54.1618.455.v2 +3 -0
- runs/eval/events.out.tfevents.1659427595.nefgpu54.33725.455.v2 +3 -0
- runs/eval/events.out.tfevents.1659993621.nefgpu54.15278.455.v2 +3 -0
- runs/train/p1/events.out.tfevents.1658662619.nefgpu54.32602.461.v2 +3 -0
- runs/train/p1/events.out.tfevents.1658780993.nefgpu54.28912.461.v2 +3 -0
- runs/train/p1/events.out.tfevents.1658782266.nefgpu54.32401.461.v2 +3 -0
- runs/train/p1/events.out.tfevents.1658814681.nefgpu54.80450.461.v2 +3 -0
- runs/train/p1/events.out.tfevents.1658822505.nefgpu54.33886.461.v2 +3 -0
- runs/train/p1/events.out.tfevents.1658978757.nefgpu54.33349.463.v2 +3 -0
- runs/train/p1/events.out.tfevents.1659006537.nefgpu54.32289.463.v2 +3 -0
- runs/train/p1/events.out.tfevents.1659006973.nefgpu54.34472.463.v2 +3 -0
- runs/train/p2/events.out.tfevents.1659133070.nefgpu54.1618.463.v2 +3 -0
- runs/train/p2/events.out.tfevents.1659427595.nefgpu54.33725.463.v2 +3 -0
- runs/train/p2/events.out.tfevents.1659993621.nefgpu54.15278.463.v2 +3 -0
- runs/training_summary.txt +24 -0
- special_tokens_map.json +1 -0
- spm.model +3 -0
- tokenizer.json +0 -0
- tokenizer_config.json +17 -0
.gitattributes
CHANGED
@@ -32,3 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
32 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
33 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
34 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
32 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
33 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
34 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
35 |
+
*.ckpt-* filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,97 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
# CamemBERTa: A French language model based on DeBERTa V3
|
6 |
+
|
7 |
+
CamemBERTa, a French language model based on DeBERTa V3, which is a DeBerta V2 with ELECTRA style pretraining using the Replaced Token Detection (RTD) objective.
|
8 |
+
RTD uses a generator model, trained using the MLM objective, to replace masked tokens with plausible candidates, and a discriminator model trained to detect which tokens were replaced by the generator.
|
9 |
+
Usually the generator and discriminator share the same embedding matrix, but the authors of DeBERTa V3 propose a new technique to disentagle the gradients of the shared embedding between the generator and discriminator called gradient-disentangled embedding sharing (GDES)
|
10 |
+
|
11 |
+
*This the first publicly available implementation of DeBERTa V3, and the first publicly DeBERTaV3 model outside of the original Microsoft release.*
|
12 |
+
|
13 |
+
Preprint Paper: https://inria.hal.science/hal-03963729/
|
14 |
+
Pre-training Code: https://gitlab.inria.fr/almanach/CamemBERTa
|
15 |
+
|
16 |
+
## How to use CamemBERTa
|
17 |
+
Our pretrained weights are available on the HuggingFace model hub, you can load them using the following code:
|
18 |
+
|
19 |
+
```python
|
20 |
+
from transformers import AutoTokenizer, AutoModel, AutoModelForMaskedLM
|
21 |
+
|
22 |
+
CamemBERTa = AutoModel.from_pretrained("almanach/camemberta-base")
|
23 |
+
tokenizer = AutoTokenizer.from_pretrained("almanach/camemberta-base")
|
24 |
+
|
25 |
+
CamemBERTa_gen = AutoModelForMaskedLM.from_pretrained("almanach/camemberta-base-generator")
|
26 |
+
tokenizer_gen = AutoTokenizer.from_pretrained("almanach/camemberta-base-generator")
|
27 |
+
```
|
28 |
+
|
29 |
+
We also include the TF2 weights including the weights for the model's RTD head for the discriminator, and the MLM head for the generator.
|
30 |
+
CamemBERTa is compatible with most finetuning scripts from the transformers library.
|
31 |
+
|
32 |
+
## Pretraining Setup
|
33 |
+
|
34 |
+
The model was trained on the French subset of the CCNet corpus (the same subset used in CamemBERT and PaGNOL) and is available on the HuggingFace model hub: CamemBERTa and CamemBERTa Generator.
|
35 |
+
To speed up the pre-training experiments, the pre-training was split into two phases;
|
36 |
+
in phase 1, the model is trained with a maximum sequence length of 128 tokens for 10,000 steps with 2,000 warm-up steps and a very large batch size of 67,584.
|
37 |
+
In phase 2, maximum sequence length is increased to the full model capacity of 512 tokens for 3,300 steps with 200 warm-up steps and a batch size of 27,648.
|
38 |
+
The model would have seen 133B tokens compared to 419B tokens for CamemBERT-CCNet which was trained for 100K steps, this represents roughly 30% of CamemBERT’s full training.
|
39 |
+
To have a fair comparison, we trained a RoBERTa model, CamemBERT30%, using the same exact pretraining setup but with the MLM objective.
|
40 |
+
|
41 |
+
## Pretraining Loss Curves
|
42 |
+
check the tensorboard logs and plots
|
43 |
+
|
44 |
+
## Fine-tuning results
|
45 |
+
|
46 |
+
Datasets: POS tagging and Dependency Parsing (GSD, Rhapsodie, Sequoia, FSMB), NER (FTB), the FLUE benchmark (XNLI, CLS, PAWS-X), and the French Question Answering Dataset (FQuAD)
|
47 |
+
|
48 |
+
| Model | UPOS | LAS | NER | CLS | PAWS-X | XNLI | F1 (FQuAD) | EM (FQuAD) |
|
49 |
+
|-------------------|-----------|-----------|-----------|-----------|-----------|-----------|------------|------------|
|
50 |
+
| CamemBERT (CCNet) | **97.59** | **88.69** | 89.97 | 94.62 | 91.36 | 81.95 | 80.98 | **62.51** |
|
51 |
+
| CamemBERT (30%) | 97.53 | 87.98 | **91.04** | 93.28 | 88.94 | 79.89 | 75.14 | 56.19 |
|
52 |
+
| CamemBERTa | 97.57 | 88.55 | 90.33 | **94.92** | **91.67** | **82.00** | **81.15** | 62.01 |
|
53 |
+
|
54 |
+
The following table compares CamemBERTa's performance on XNLI against other models under different training setups, which demonstrates the data efficiency of CamemBERTa.
|
55 |
+
|
56 |
+
|
57 |
+
| Model | XNLI (Acc.) | Training Steps | Tokens seen in pre-training | Dataset Size in Tokens |
|
58 |
+
|-------------------|-------------|----------------|-----------------------------|------------------------|
|
59 |
+
| mDeBERTa | 84.4 | 500k | 2T | 2.5T |
|
60 |
+
| CamemBERTa | 82.0 | 33k | 0.139T | 0.319T |
|
61 |
+
| XLM-R | 81.4 | 1.5M | 6T | 2.5T |
|
62 |
+
| CamemBERT - CCNet | 81.95 | 100k | 0.419T | 0.319T |
|
63 |
+
|
64 |
+
*Note: The CamemBERTa training steps was adjusted for a batch size of 8192.*
|
65 |
+
|
66 |
+
## License
|
67 |
+
|
68 |
+
The public model weights are licensed under MIT License.
|
69 |
+
This code is licensed under the Apache License 2.0.
|
70 |
+
|
71 |
+
## Citation
|
72 |
+
|
73 |
+
Paper accepted to Findings of ACL 2023.
|
74 |
+
|
75 |
+
You can use the preprint citation for now
|
76 |
+
|
77 |
+
```
|
78 |
+
@article{antoun2023camemberta
|
79 |
+
TITLE = {{Data-Efficient French Language Modeling with CamemBERTa}},
|
80 |
+
AUTHOR = {Antoun, Wissam and Sagot, Beno{\^i}t and Seddah, Djam{\'e}},
|
81 |
+
URL = {https://inria.hal.science/hal-03963729},
|
82 |
+
NOTE = {working paper or preprint},
|
83 |
+
YEAR = {2023},
|
84 |
+
MONTH = Jan,
|
85 |
+
PDF = {https://inria.hal.science/hal-03963729/file/French_DeBERTa___ACL_2023%20to%20be%20uploaded.pdf},
|
86 |
+
HAL_ID = {hal-03963729},
|
87 |
+
HAL_VERSION = {v1},
|
88 |
+
}
|
89 |
+
```
|
90 |
+
|
91 |
+
## Contact
|
92 |
+
|
93 |
+
Wissam Antoun: `wissam (dot) antoun (at) inria (dot) fr`
|
94 |
+
|
95 |
+
Benoit Sagot: `benoit (dot) sagot (at) inria (dot) fr`
|
96 |
+
|
97 |
+
Djame Seddah: `djame (dot) seddah (at) inria (dot) fr`
|
added_tokens.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"[UNK]": 32001, "[PAD]": 32002}
|
ckpt-3400.ckpt-137.data-00000-of-00001
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ec796fc8fbd5134ae5c0d5a612c60a3335075b94f83fee1ccf03bb78e5ffe2ee
|
3 |
+
size 1766899682
|
ckpt-3400.ckpt-137.index
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:99c19dc261521f491d38078f77beba984f630cf4d936609186b4f4f89e437ec9
|
3 |
+
size 92494
|
config.json
ADDED
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "anondeb/debv3-base",
|
3 |
+
"amp": true,
|
4 |
+
"architectures": [
|
5 |
+
"DebertaV2Model"
|
6 |
+
],
|
7 |
+
"attention_probs_dropout_prob": 0.1,
|
8 |
+
"conv_act": "gelu",
|
9 |
+
"conv_kernel_size": 3,
|
10 |
+
"embedding_size": 768,
|
11 |
+
"hidden_act": "gelu",
|
12 |
+
"hidden_dropout_prob": 0.1,
|
13 |
+
"hidden_size": 768,
|
14 |
+
"initializer_range": 0.02,
|
15 |
+
"intermediate_size": 3072,
|
16 |
+
"layer_norm_eps": 1e-07,
|
17 |
+
"max_position_embeddings": 512,
|
18 |
+
"max_relative_positions": -1,
|
19 |
+
"model_name": "camemberta-base",
|
20 |
+
"model_type": "deberta-v2",
|
21 |
+
"norm_rel_ebd": "layer_norm",
|
22 |
+
"num_attention_heads": 12,
|
23 |
+
"num_hidden_layers": 12,
|
24 |
+
"pad_token_id": 0,
|
25 |
+
"pooler_dropout": 0,
|
26 |
+
"pooler_hidden_act": "gelu",
|
27 |
+
"pooler_hidden_size": 768,
|
28 |
+
"pos_att_type": [
|
29 |
+
"p2c",
|
30 |
+
"c2p"
|
31 |
+
],
|
32 |
+
"position_biased_input": false,
|
33 |
+
"position_buckets": 256,
|
34 |
+
"relative_attention": true,
|
35 |
+
"share_att_key": true,
|
36 |
+
"torch_dtype": "float32",
|
37 |
+
"transformers_version": "4.20.1",
|
38 |
+
"type_vocab_size": 0,
|
39 |
+
"vocab_size": 32008
|
40 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ae6d1d5495d283736fbf8808c8177c364638abfef16c20f3d4a65683ef42f23e
|
3 |
+
size 447289360
|
runs/.gitkeep
ADDED
File without changes
|
runs/eval/events.out.tfevents.1658662619.nefgpu54.32602.453.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:abff33d04db4af03dd3c8fd067478e73a80d56dc326f7372a9cc18e8a776dc96
|
3 |
+
size 40
|
runs/eval/events.out.tfevents.1658780993.nefgpu54.28912.453.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:077bf001772d30e10dc5a0a725f4b82e27b4871249297630e8e6d80744c67cb7
|
3 |
+
size 40
|
runs/eval/events.out.tfevents.1658782266.nefgpu54.32401.453.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0e43bf47fdcca2a060107d13f5519364caa48fbc21da07fec23d4d7c2202e928
|
3 |
+
size 40
|
runs/eval/events.out.tfevents.1658814681.nefgpu54.80450.453.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:28f1dd62d5aa1b4a4f1aebf1e3b9a195dcbaf3403684d15ae022645d734358cd
|
3 |
+
size 40
|
runs/eval/events.out.tfevents.1658822505.nefgpu54.33886.453.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0e60ab04655c96abd22edca70f5f3a9b07ae7a2eba4be9064968339b2bc9f047
|
3 |
+
size 40
|
runs/eval/events.out.tfevents.1658978757.nefgpu54.33349.455.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8d53cf9d245fb7b4ce9b318701b924004b4315c6c1df5c56443f48ec2859a5f2
|
3 |
+
size 40
|
runs/eval/events.out.tfevents.1659006537.nefgpu54.32289.455.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1a2831d6cc6060e6e01a9173ef21f2cd8388ff6e3a28be7ce52db5b7a97031de
|
3 |
+
size 40
|
runs/eval/events.out.tfevents.1659006973.nefgpu54.34472.455.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4e4126a80bf4195601d8ac0494a2d13bea3200698e79fbe31b58888fad2fd578
|
3 |
+
size 40
|
runs/eval/events.out.tfevents.1659133070.nefgpu54.1618.455.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:263099010447ac4ef8020ef5df72b0070baaeee41e257191ac5ed9310e73d5fd
|
3 |
+
size 40
|
runs/eval/events.out.tfevents.1659427595.nefgpu54.33725.455.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6412c40d4909dfccca3144c46e5bc815af8d12955cffd16610259230abe68f39
|
3 |
+
size 40
|
runs/eval/events.out.tfevents.1659993621.nefgpu54.15278.455.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:84a318839fe4bf4bc681eae19c894f0cdc49a532d70a60aadf760fdaf3f0b544
|
3 |
+
size 40
|
runs/train/p1/events.out.tfevents.1658662619.nefgpu54.32602.461.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7902a3b23f6f324bb68e9fcac5172283e35e8b8366d39a26f790f33d9f164161
|
3 |
+
size 237508
|
runs/train/p1/events.out.tfevents.1658780993.nefgpu54.28912.461.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e42ed2c1aebb7df82aa3eea4a4e3fcf4b739cf29d01ea36f0546815e6d510d39
|
3 |
+
size 821
|
runs/train/p1/events.out.tfevents.1658782266.nefgpu54.32401.461.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0e43bf47fdcca2a060107d13f5519364caa48fbc21da07fec23d4d7c2202e928
|
3 |
+
size 40
|
runs/train/p1/events.out.tfevents.1658814681.nefgpu54.80450.461.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:28f1dd62d5aa1b4a4f1aebf1e3b9a195dcbaf3403684d15ae022645d734358cd
|
3 |
+
size 40
|
runs/train/p1/events.out.tfevents.1658822505.nefgpu54.33886.461.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:028a860984ba3a4c84fdceb9408e969930c186b9b1e609c8d944fd4549848f35
|
3 |
+
size 237640
|
runs/train/p1/events.out.tfevents.1658978757.nefgpu54.33349.463.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8d53cf9d245fb7b4ce9b318701b924004b4315c6c1df5c56443f48ec2859a5f2
|
3 |
+
size 40
|
runs/train/p1/events.out.tfevents.1659006537.nefgpu54.32289.463.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1a2831d6cc6060e6e01a9173ef21f2cd8388ff6e3a28be7ce52db5b7a97031de
|
3 |
+
size 40
|
runs/train/p1/events.out.tfevents.1659006973.nefgpu54.34472.463.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:afc73cfd2913a9332bda9226fe81e408c3a15ec2f074b1397507d4fc3ef16924
|
3 |
+
size 269320
|
runs/train/p2/events.out.tfevents.1659133070.nefgpu54.1618.463.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3e479289d46f17b0cd2af8c4d9210b70e5cf5c0b4b8c8f9fa9f288aa7fcec6e8
|
3 |
+
size 197908
|
runs/train/p2/events.out.tfevents.1659427595.nefgpu54.33725.463.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b704c973c7e26b0b484341b42cf4af42bd34d4de43950a8ef1d11568876f95d3
|
3 |
+
size 14296
|
runs/train/p2/events.out.tfevents.1659993621.nefgpu54.15278.463.v2
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e87c83148232ebd9dd3a75bc41b0868bc93bf307a863c263959ab586c53623a2
|
3 |
+
size 42808
|
runs/training_summary.txt
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"total_training_steps": 3400,
|
3 |
+
"train_loss": 7.284830570220947,
|
4 |
+
"last_train_metrics_train_perf": 266.8984680175781,
|
5 |
+
"last_train_metrics_total_loss": 7.284830570220947,
|
6 |
+
"last_train_metrics_masked_lm_accuracy": 0.6975675821304321,
|
7 |
+
"last_train_metrics_masked_lm_loss": 1.4581712484359741,
|
8 |
+
"last_train_metrics_sampled_masked_lm_accuracy": 0.6167887449264526,
|
9 |
+
"last_train_metrics_disc_loss": 0.1190570518374443,
|
10 |
+
"last_train_metrics_disc_auc": 0.0,
|
11 |
+
"last_train_metrics_disc_accuracy": 0.960852861404419,
|
12 |
+
"last_train_metrics_disc_precision": 0.7765898704528809,
|
13 |
+
"last_train_metrics_disc_recall": 0.38164031505584717,
|
14 |
+
"eval_metrics_train_perf": 0.0,
|
15 |
+
"eval_metrics_total_loss": 0.0,
|
16 |
+
"eval_metrics_masked_lm_accuracy": 0.0,
|
17 |
+
"eval_metrics_masked_lm_loss": 0.0,
|
18 |
+
"eval_metrics_sampled_masked_lm_accuracy": 0.0,
|
19 |
+
"eval_metrics_disc_loss": 0.0,
|
20 |
+
"eval_metrics_disc_auc": 0.0,
|
21 |
+
"eval_metrics_disc_accuracy": 0.0,
|
22 |
+
"eval_metrics_disc_precision": 0.0,
|
23 |
+
"eval_metrics_disc_recall": 0.0
|
24 |
+
}
|
special_tokens_map.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"bos_token": "[CLS]", "eos_token": "[SEP]", "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
|
spm.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:eaf6658b4c5f33b1a6092e07deec6f921e4c6e87bf3068d109a2f1fd44849b50
|
3 |
+
size 808787
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"do_lower_case": false,
|
3 |
+
"bos_token": "[CLS]",
|
4 |
+
"eos_token": "[SEP]",
|
5 |
+
"unk_token": "[UNK]",
|
6 |
+
"sep_token": "[SEP]",
|
7 |
+
"pad_token": "[PAD]",
|
8 |
+
"cls_token": "[CLS]",
|
9 |
+
"mask_token": "[MASK]",
|
10 |
+
"split_by_punct": false,
|
11 |
+
"special_tokens_map_file": null,
|
12 |
+
"name_or_path": "vocab/camembert-deberta/",
|
13 |
+
"sp_model_kwargs": {},
|
14 |
+
"tokenizer_file": null,
|
15 |
+
"tokenizer_class": "DebertaV2Tokenizer",
|
16 |
+
"vocab_type": "spm"
|
17 |
+
}
|