KoichiYasuoka commited on
Commit
c0e2ec7
1 Parent(s): 060a64d

initial release

Browse files
README.md CHANGED
@@ -1,3 +1,39 @@
1
- ---
2
- license: cc-by-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - "en"
4
+ tags:
5
+ - "english"
6
+ - "token-classification"
7
+ - "pos"
8
+ - "dependency-parsing"
9
+ datasets:
10
+ - "universal_dependencies"
11
+ license: "cc-by-sa-4.0"
12
+ pipeline_tag: "token-classification"
13
+ ---
14
+
15
+ # roberta-large-english-upos
16
+
17
+ ## Model Description
18
+
19
+ This is a RoBERTa model pre-trained with [UD_English](https://universaldependencies.org/en/) for POS-tagging and dependency-parsing, derived from [roberta-large](https://huggingface.co/roberta-large). Every word is tagged by [UPOS](https://universaldependencies.org/u/pos/) (Universal Part-Of-Speech).
20
+
21
+ ## How to Use
22
+
23
+ ```py
24
+ from transformers import AutoTokenizer,AutoModelForTokenClassification
25
+ tokenizer=AutoTokenizer.from_pretrained("KoichiYasuoka/roberta-large-english-upos")
26
+ model=AutoModelForTokenClassification.from_pretrained("KoichiYasuoka/roberta-large-english-upos")
27
+ ```
28
+
29
+ or
30
+
31
+ ```py
32
+ import esupar
33
+ nlp=esupar.load("KoichiYasuoka/roberta-large-english-upos")
34
+ ```
35
+
36
+ ## See Also
37
+
38
+ [esupar](https://github.com/KoichiYasuoka/esupar): Tokenizer POS-tagger and Dependency-parser with BERT/RoBERTa models
39
+
config.json ADDED
@@ -0,0 +1,2198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "RobertaForTokenClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 1024,
12
+ "id2label": {
13
+ "0": "ADJ",
14
+ "1": "ADP",
15
+ "2": "ADP+DET",
16
+ "3": "ADP+PRON",
17
+ "4": "ADV",
18
+ "5": "ADV+AUX",
19
+ "6": "ADV+PART",
20
+ "7": "AUX",
21
+ "8": "AUX+PART",
22
+ "9": "B-ADJ",
23
+ "10": "B-ADJ+ADJ",
24
+ "11": "B-ADJ+NOUN",
25
+ "12": "B-ADJ+NOUN+NOUN",
26
+ "13": "B-ADJ+PART",
27
+ "14": "B-ADJ+PROPN",
28
+ "15": "B-ADJ+PUNCT",
29
+ "16": "B-ADP",
30
+ "17": "B-ADP+ADJ",
31
+ "18": "B-ADP+NOUN",
32
+ "19": "B-ADP+PRON",
33
+ "20": "B-ADV",
34
+ "21": "B-ADV+AUX",
35
+ "22": "B-ADV+PUNCT",
36
+ "23": "B-AUX",
37
+ "24": "B-AUX+ADV",
38
+ "25": "B-AUX+PART",
39
+ "26": "B-AUX+PART+VERB",
40
+ "27": "B-AUX+VERB",
41
+ "28": "B-CCONJ",
42
+ "29": "B-DET",
43
+ "30": "B-DET+AUX",
44
+ "31": "B-DET+NOUN",
45
+ "32": "B-INTJ",
46
+ "33": "B-INTJ+PUNCT",
47
+ "34": "B-NOUN",
48
+ "35": "B-NOUN+ADJ",
49
+ "36": "B-NOUN+ADP",
50
+ "37": "B-NOUN+AUX",
51
+ "38": "B-NOUN+NOUN",
52
+ "39": "B-NOUN+NOUN+VERB",
53
+ "40": "B-NOUN+PART",
54
+ "41": "B-NOUN+PROPN",
55
+ "42": "B-NOUN+PUNCT",
56
+ "43": "B-NOUN+SCONJ",
57
+ "44": "B-NOUN+VERB",
58
+ "45": "B-NUM",
59
+ "46": "B-PART",
60
+ "47": "B-PRON",
61
+ "48": "B-PRON+ADJ",
62
+ "49": "B-PRON+ADV",
63
+ "50": "B-PRON+AUX",
64
+ "51": "B-PRON+NOUN",
65
+ "52": "B-PRON+PART",
66
+ "53": "B-PRON+PRON",
67
+ "54": "B-PRON+VERB",
68
+ "55": "B-PROPN",
69
+ "56": "B-PROPN+ADP",
70
+ "57": "B-PROPN+AUX",
71
+ "58": "B-PROPN+PART",
72
+ "59": "B-PROPN+PROPN",
73
+ "60": "B-PROPN+PUNCT",
74
+ "61": "B-PROPN+PUNCT+PUNCT",
75
+ "62": "B-PROPN+VERB",
76
+ "63": "B-PUNCT",
77
+ "64": "B-PUNCT+PUNCT",
78
+ "65": "B-PUNCT+PUNCT+PUNCT",
79
+ "66": "B-PUNCT+SYM+PUNCT",
80
+ "67": "B-SCONJ",
81
+ "68": "B-SYM",
82
+ "69": "B-VERB",
83
+ "70": "B-VERB+ADJ",
84
+ "71": "B-VERB+ADJ+CCONJ",
85
+ "72": "B-VERB+ADP",
86
+ "73": "B-VERB+ADV",
87
+ "74": "B-VERB+ADV+PUNCT",
88
+ "75": "B-VERB+AUX",
89
+ "76": "B-VERB+CCONJ",
90
+ "77": "B-VERB+DET",
91
+ "78": "B-VERB+NOUN",
92
+ "79": "B-VERB+NOUN+CCONJ",
93
+ "80": "B-VERB+NOUN+NOUN",
94
+ "81": "B-VERB+PART",
95
+ "82": "B-VERB+PRON",
96
+ "83": "B-VERB+PRON+ADP",
97
+ "84": "B-VERB+PRON+ADV",
98
+ "85": "B-VERB+PROPN",
99
+ "86": "B-VERB+SCONJ",
100
+ "87": "B-VERB+VERB",
101
+ "88": "B-VERB+VERB+NOUN",
102
+ "89": "B-X",
103
+ "90": "B-X+PUNCT",
104
+ "91": "B-X+PUNCT+PUNCT",
105
+ "92": "B-X+X",
106
+ "93": "B-X+X+PRON",
107
+ "94": "CCONJ",
108
+ "95": "DET",
109
+ "96": "DET+NUM",
110
+ "97": "I-ADJ",
111
+ "98": "I-ADJ+ADJ",
112
+ "99": "I-ADJ+NOUN",
113
+ "100": "I-ADJ+NOUN+NOUN",
114
+ "101": "I-ADJ+PART",
115
+ "102": "I-ADJ+PROPN",
116
+ "103": "I-ADJ+PUNCT",
117
+ "104": "I-ADP",
118
+ "105": "I-ADP+ADJ",
119
+ "106": "I-ADP+NOUN",
120
+ "107": "I-ADP+PRON",
121
+ "108": "I-ADV",
122
+ "109": "I-ADV+AUX",
123
+ "110": "I-ADV+PUNCT",
124
+ "111": "I-AUX",
125
+ "112": "I-AUX+ADV",
126
+ "113": "I-AUX+PART",
127
+ "114": "I-AUX+PART+VERB",
128
+ "115": "I-AUX+VERB",
129
+ "116": "I-CCONJ",
130
+ "117": "I-DET",
131
+ "118": "I-DET+AUX",
132
+ "119": "I-DET+NOUN",
133
+ "120": "I-INTJ",
134
+ "121": "I-INTJ+PUNCT",
135
+ "122": "I-NOUN",
136
+ "123": "I-NOUN+ADJ",
137
+ "124": "I-NOUN+ADP",
138
+ "125": "I-NOUN+AUX",
139
+ "126": "I-NOUN+NOUN",
140
+ "127": "I-NOUN+NOUN+VERB",
141
+ "128": "I-NOUN+PART",
142
+ "129": "I-NOUN+PROPN",
143
+ "130": "I-NOUN+PUNCT",
144
+ "131": "I-NOUN+SCONJ",
145
+ "132": "I-NOUN+VERB",
146
+ "133": "I-NUM",
147
+ "134": "I-PART",
148
+ "135": "I-PRON",
149
+ "136": "I-PRON+ADJ",
150
+ "137": "I-PRON+ADV",
151
+ "138": "I-PRON+AUX",
152
+ "139": "I-PRON+NOUN",
153
+ "140": "I-PRON+PART",
154
+ "141": "I-PRON+PRON",
155
+ "142": "I-PRON+VERB",
156
+ "143": "I-PROPN",
157
+ "144": "I-PROPN+ADP",
158
+ "145": "I-PROPN+AUX",
159
+ "146": "I-PROPN+PART",
160
+ "147": "I-PROPN+PROPN",
161
+ "148": "I-PROPN+PUNCT",
162
+ "149": "I-PROPN+PUNCT+PUNCT",
163
+ "150": "I-PROPN+VERB",
164
+ "151": "I-PUNCT",
165
+ "152": "I-PUNCT+PUNCT",
166
+ "153": "I-PUNCT+PUNCT+PUNCT",
167
+ "154": "I-PUNCT+SYM+PUNCT",
168
+ "155": "I-SCONJ",
169
+ "156": "I-SYM",
170
+ "157": "I-VERB",
171
+ "158": "I-VERB+ADJ",
172
+ "159": "I-VERB+ADJ+CCONJ",
173
+ "160": "I-VERB+ADP",
174
+ "161": "I-VERB+ADV",
175
+ "162": "I-VERB+ADV+PUNCT",
176
+ "163": "I-VERB+AUX",
177
+ "164": "I-VERB+CCONJ",
178
+ "165": "I-VERB+DET",
179
+ "166": "I-VERB+NOUN",
180
+ "167": "I-VERB+NOUN+CCONJ",
181
+ "168": "I-VERB+NOUN+NOUN",
182
+ "169": "I-VERB+PART",
183
+ "170": "I-VERB+PRON",
184
+ "171": "I-VERB+PRON+ADP",
185
+ "172": "I-VERB+PRON+ADV",
186
+ "173": "I-VERB+PROPN",
187
+ "174": "I-VERB+SCONJ",
188
+ "175": "I-VERB+VERB",
189
+ "176": "I-VERB+VERB+NOUN",
190
+ "177": "I-X",
191
+ "178": "I-X+PUNCT",
192
+ "179": "I-X+PUNCT+PUNCT",
193
+ "180": "I-X+X",
194
+ "181": "I-X+X+PRON",
195
+ "182": "INTJ",
196
+ "183": "NOUN",
197
+ "184": "NOUN+AUX",
198
+ "185": "NOUN+PART",
199
+ "186": "NUM",
200
+ "187": "PART",
201
+ "188": "PRON",
202
+ "189": "PRON+AUX",
203
+ "190": "PRON+VERB",
204
+ "191": "PROPN",
205
+ "192": "PROPN+PART",
206
+ "193": "PUNCT",
207
+ "194": "PUNCT+PUNCT",
208
+ "195": "PUNCT+PUNCT+PUNCT",
209
+ "196": "PUNCT+SYM",
210
+ "197": "SCONJ",
211
+ "198": "SYM",
212
+ "199": "SYM+PUNCT",
213
+ "200": "SYM+SYM",
214
+ "201": "VERB",
215
+ "202": "VERB+ADP",
216
+ "203": "VERB+PART",
217
+ "204": "VERB+PRON",
218
+ "205": "VERB+VERB",
219
+ "206": "X",
220
+ "207": "X+X"
221
+ },
222
+ "initializer_range": 0.02,
223
+ "intermediate_size": 4096,
224
+ "label2id": {
225
+ "ADJ": 0,
226
+ "ADP": 1,
227
+ "ADP+DET": 2,
228
+ "ADP+PRON": 3,
229
+ "ADV": 4,
230
+ "ADV+AUX": 5,
231
+ "ADV+PART": 6,
232
+ "AUX": 7,
233
+ "AUX+PART": 8,
234
+ "B-ADJ": 9,
235
+ "B-ADJ+ADJ": 10,
236
+ "B-ADJ+NOUN": 11,
237
+ "B-ADJ+NOUN+NOUN": 12,
238
+ "B-ADJ+PART": 13,
239
+ "B-ADJ+PROPN": 14,
240
+ "B-ADJ+PUNCT": 15,
241
+ "B-ADP": 16,
242
+ "B-ADP+ADJ": 17,
243
+ "B-ADP+NOUN": 18,
244
+ "B-ADP+PRON": 19,
245
+ "B-ADV": 20,
246
+ "B-ADV+AUX": 21,
247
+ "B-ADV+PUNCT": 22,
248
+ "B-AUX": 23,
249
+ "B-AUX+ADV": 24,
250
+ "B-AUX+PART": 25,
251
+ "B-AUX+PART+VERB": 26,
252
+ "B-AUX+VERB": 27,
253
+ "B-CCONJ": 28,
254
+ "B-DET": 29,
255
+ "B-DET+AUX": 30,
256
+ "B-DET+NOUN": 31,
257
+ "B-INTJ": 32,
258
+ "B-INTJ+PUNCT": 33,
259
+ "B-NOUN": 34,
260
+ "B-NOUN+ADJ": 35,
261
+ "B-NOUN+ADP": 36,
262
+ "B-NOUN+AUX": 37,
263
+ "B-NOUN+NOUN": 38,
264
+ "B-NOUN+NOUN+VERB": 39,
265
+ "B-NOUN+PART": 40,
266
+ "B-NOUN+PROPN": 41,
267
+ "B-NOUN+PUNCT": 42,
268
+ "B-NOUN+SCONJ": 43,
269
+ "B-NOUN+VERB": 44,
270
+ "B-NUM": 45,
271
+ "B-PART": 46,
272
+ "B-PRON": 47,
273
+ "B-PRON+ADJ": 48,
274
+ "B-PRON+ADV": 49,
275
+ "B-PRON+AUX": 50,
276
+ "B-PRON+NOUN": 51,
277
+ "B-PRON+PART": 52,
278
+ "B-PRON+PRON": 53,
279
+ "B-PRON+VERB": 54,
280
+ "B-PROPN": 55,
281
+ "B-PROPN+ADP": 56,
282
+ "B-PROPN+AUX": 57,
283
+ "B-PROPN+PART": 58,
284
+ "B-PROPN+PROPN": 59,
285
+ "B-PROPN+PUNCT": 60,
286
+ "B-PROPN+PUNCT+PUNCT": 61,
287
+ "B-PROPN+VERB": 62,
288
+ "B-PUNCT": 63,
289
+ "B-PUNCT+PUNCT": 64,
290
+ "B-PUNCT+PUNCT+PUNCT": 65,
291
+ "B-PUNCT+SYM+PUNCT": 66,
292
+ "B-SCONJ": 67,
293
+ "B-SYM": 68,
294
+ "B-VERB": 69,
295
+ "B-VERB+ADJ": 70,
296
+ "B-VERB+ADJ+CCONJ": 71,
297
+ "B-VERB+ADP": 72,
298
+ "B-VERB+ADV": 73,
299
+ "B-VERB+ADV+PUNCT": 74,
300
+ "B-VERB+AUX": 75,
301
+ "B-VERB+CCONJ": 76,
302
+ "B-VERB+DET": 77,
303
+ "B-VERB+NOUN": 78,
304
+ "B-VERB+NOUN+CCONJ": 79,
305
+ "B-VERB+NOUN+NOUN": 80,
306
+ "B-VERB+PART": 81,
307
+ "B-VERB+PRON": 82,
308
+ "B-VERB+PRON+ADP": 83,
309
+ "B-VERB+PRON+ADV": 84,
310
+ "B-VERB+PROPN": 85,
311
+ "B-VERB+SCONJ": 86,
312
+ "B-VERB+VERB": 87,
313
+ "B-VERB+VERB+NOUN": 88,
314
+ "B-X": 89,
315
+ "B-X+PUNCT": 90,
316
+ "B-X+PUNCT+PUNCT": 91,
317
+ "B-X+X": 92,
318
+ "B-X+X+PRON": 93,
319
+ "CCONJ": 94,
320
+ "DET": 95,
321
+ "DET+NUM": 96,
322
+ "I-ADJ": 97,
323
+ "I-ADJ+ADJ": 98,
324
+ "I-ADJ+NOUN": 99,
325
+ "I-ADJ+NOUN+NOUN": 100,
326
+ "I-ADJ+PART": 101,
327
+ "I-ADJ+PROPN": 102,
328
+ "I-ADJ+PUNCT": 103,
329
+ "I-ADP": 104,
330
+ "I-ADP+ADJ": 105,
331
+ "I-ADP+NOUN": 106,
332
+ "I-ADP+PRON": 107,
333
+ "I-ADV": 108,
334
+ "I-ADV+AUX": 109,
335
+ "I-ADV+PUNCT": 110,
336
+ "I-AUX": 111,
337
+ "I-AUX+ADV": 112,
338
+ "I-AUX+PART": 113,
339
+ "I-AUX+PART+VERB": 114,
340
+ "I-AUX+VERB": 115,
341
+ "I-CCONJ": 116,
342
+ "I-DET": 117,
343
+ "I-DET+AUX": 118,
344
+ "I-DET+NOUN": 119,
345
+ "I-INTJ": 120,
346
+ "I-INTJ+PUNCT": 121,
347
+ "I-NOUN": 122,
348
+ "I-NOUN+ADJ": 123,
349
+ "I-NOUN+ADP": 124,
350
+ "I-NOUN+AUX": 125,
351
+ "I-NOUN+NOUN": 126,
352
+ "I-NOUN+NOUN+VERB": 127,
353
+ "I-NOUN+PART": 128,
354
+ "I-NOUN+PROPN": 129,
355
+ "I-NOUN+PUNCT": 130,
356
+ "I-NOUN+SCONJ": 131,
357
+ "I-NOUN+VERB": 132,
358
+ "I-NUM": 133,
359
+ "I-PART": 134,
360
+ "I-PRON": 135,
361
+ "I-PRON+ADJ": 136,
362
+ "I-PRON+ADV": 137,
363
+ "I-PRON+AUX": 138,
364
+ "I-PRON+NOUN": 139,
365
+ "I-PRON+PART": 140,
366
+ "I-PRON+PRON": 141,
367
+ "I-PRON+VERB": 142,
368
+ "I-PROPN": 143,
369
+ "I-PROPN+ADP": 144,
370
+ "I-PROPN+AUX": 145,
371
+ "I-PROPN+PART": 146,
372
+ "I-PROPN+PROPN": 147,
373
+ "I-PROPN+PUNCT": 148,
374
+ "I-PROPN+PUNCT+PUNCT": 149,
375
+ "I-PROPN+VERB": 150,
376
+ "I-PUNCT": 151,
377
+ "I-PUNCT+PUNCT": 152,
378
+ "I-PUNCT+PUNCT+PUNCT": 153,
379
+ "I-PUNCT+SYM+PUNCT": 154,
380
+ "I-SCONJ": 155,
381
+ "I-SYM": 156,
382
+ "I-VERB": 157,
383
+ "I-VERB+ADJ": 158,
384
+ "I-VERB+ADJ+CCONJ": 159,
385
+ "I-VERB+ADP": 160,
386
+ "I-VERB+ADV": 161,
387
+ "I-VERB+ADV+PUNCT": 162,
388
+ "I-VERB+AUX": 163,
389
+ "I-VERB+CCONJ": 164,
390
+ "I-VERB+DET": 165,
391
+ "I-VERB+NOUN": 166,
392
+ "I-VERB+NOUN+CCONJ": 167,
393
+ "I-VERB+NOUN+NOUN": 168,
394
+ "I-VERB+PART": 169,
395
+ "I-VERB+PRON": 170,
396
+ "I-VERB+PRON+ADP": 171,
397
+ "I-VERB+PRON+ADV": 172,
398
+ "I-VERB+PROPN": 173,
399
+ "I-VERB+SCONJ": 174,
400
+ "I-VERB+VERB": 175,
401
+ "I-VERB+VERB+NOUN": 176,
402
+ "I-X": 177,
403
+ "I-X+PUNCT": 178,
404
+ "I-X+PUNCT+PUNCT": 179,
405
+ "I-X+X": 180,
406
+ "I-X+X+PRON": 181,
407
+ "INTJ": 182,
408
+ "NOUN": 183,
409
+ "NOUN+AUX": 184,
410
+ "NOUN+PART": 185,
411
+ "NUM": 186,
412
+ "PART": 187,
413
+ "PRON": 188,
414
+ "PRON+AUX": 189,
415
+ "PRON+VERB": 190,
416
+ "PROPN": 191,
417
+ "PROPN+PART": 192,
418
+ "PUNCT": 193,
419
+ "PUNCT+PUNCT": 194,
420
+ "PUNCT+PUNCT+PUNCT": 195,
421
+ "PUNCT+SYM": 196,
422
+ "SCONJ": 197,
423
+ "SYM": 198,
424
+ "SYM+PUNCT": 199,
425
+ "SYM+SYM": 200,
426
+ "VERB": 201,
427
+ "VERB+ADP": 202,
428
+ "VERB+PART": 203,
429
+ "VERB+PRON": 204,
430
+ "VERB+VERB": 205,
431
+ "X": 206,
432
+ "X+X": 207
433
+ },
434
+ "layer_norm_eps": 1e-05,
435
+ "max_position_embeddings": 514,
436
+ "model_type": "roberta",
437
+ "num_attention_heads": 16,
438
+ "num_hidden_layers": 24,
439
+ "pad_token_id": 1,
440
+ "position_embedding_type": "absolute",
441
+ "task_specific_params": {
442
+ "upos_multiword": {
443
+ "ADJ+ADJ": {
444
+ "bigenough": [
445
+ "big",
446
+ "enough"
447
+ ],
448
+ "interestingsocial": [
449
+ "interesting",
450
+ "social"
451
+ ],
452
+ "longeight-inch": [
453
+ "long",
454
+ "eight-inch"
455
+ ],
456
+ "pressingsocial": [
457
+ "pressing",
458
+ "social"
459
+ ]
460
+ },
461
+ "ADJ+NOUN": {
462
+ "bigsource": [
463
+ "big",
464
+ "source"
465
+ ],
466
+ "contrastingseries": [
467
+ "contrasting",
468
+ "series"
469
+ ],
470
+ "distractingelements": [
471
+ "distracting",
472
+ "elements"
473
+ ],
474
+ "fascinatingshop": [
475
+ "fascinating",
476
+ "shop"
477
+ ],
478
+ "gruelingsanctions": [
479
+ "grueling",
480
+ "sanctions"
481
+ ],
482
+ "increasingsafety": [
483
+ "increasing",
484
+ "safety"
485
+ ],
486
+ "longexposures": [
487
+ "long",
488
+ "exposures"
489
+ ],
490
+ "longhair": [
491
+ "long",
492
+ "hair"
493
+ ],
494
+ "longhistory": [
495
+ "long",
496
+ "history"
497
+ ],
498
+ "ongoingsummaries": [
499
+ "ongoing",
500
+ "summaries"
501
+ ],
502
+ "pre-meetingsite": [
503
+ "pre-meeting",
504
+ "site"
505
+ ],
506
+ "rallyingsigns": [
507
+ "rallying",
508
+ "signs"
509
+ ],
510
+ "revenue-raisingservices": [
511
+ "revenue-raising",
512
+ "services"
513
+ ],
514
+ "self-questioningshrug": [
515
+ "self-questioning",
516
+ "shrug"
517
+ ],
518
+ "simperingsmile": [
519
+ "simpering",
520
+ "smile"
521
+ ],
522
+ "stronghints": [
523
+ "strong",
524
+ "hints"
525
+ ],
526
+ "wizardingsport": [
527
+ "wizarding",
528
+ "sport"
529
+ ]
530
+ },
531
+ "ADJ+PART": {
532
+ "elses": [
533
+ "else",
534
+ "s"
535
+ ]
536
+ },
537
+ "ADJ+PROPN": {
538
+ "Nationwidetints": [
539
+ "Nationwide",
540
+ "tints"
541
+ ]
542
+ },
543
+ "ADJ+PUNCT": {
544
+ "Jr..": [
545
+ "Jr.",
546
+ "."
547
+ ],
548
+ "e.g.:": [
549
+ "e.g.",
550
+ ":"
551
+ ],
552
+ "i.e.,": [
553
+ "i.e.",
554
+ ","
555
+ ]
556
+ },
557
+ "ADP+DET": {
558
+ "des": [
559
+ "de",
560
+ "les"
561
+ ]
562
+ },
563
+ "ADP+NOUN": {
564
+ "Infact": [
565
+ "In",
566
+ "fact"
567
+ ],
568
+ "overtime": [
569
+ "over",
570
+ "time"
571
+ ]
572
+ },
573
+ "ADP+PRON": {
574
+ "init": [
575
+ "in",
576
+ "it"
577
+ ]
578
+ },
579
+ "ADV+AUX": {
580
+ "Heres": [
581
+ "Here",
582
+ "s"
583
+ ],
584
+ "longhave": [
585
+ "long",
586
+ "have"
587
+ ]
588
+ },
589
+ "ADV+PART": {
590
+ "into": [
591
+ "in",
592
+ "to"
593
+ ]
594
+ },
595
+ "ADV+PUNCT": {
596
+ "E.g.,": [
597
+ "E.g.",
598
+ ","
599
+ ],
600
+ "e.g.,": [
601
+ "e.g.",
602
+ ","
603
+ ],
604
+ "i.e.,": [
605
+ "i.e.",
606
+ ","
607
+ ],
608
+ "i.e.:": [
609
+ "i.e.",
610
+ ":"
611
+ ]
612
+ },
613
+ "AUX+ADV": {
614
+ "cannot": [
615
+ "can",
616
+ "not"
617
+ ]
618
+ },
619
+ "AUX+PART": {
620
+ "Aren't": [
621
+ "Are",
622
+ "n't"
623
+ ],
624
+ "Aren\u2019t": [
625
+ "Are",
626
+ "n\u2019t"
627
+ ],
628
+ "CANT": [
629
+ "CA",
630
+ "NT"
631
+ ],
632
+ "Can't": [
633
+ "Ca",
634
+ "n't"
635
+ ],
636
+ "Cannot": [
637
+ "Can",
638
+ "not"
639
+ ],
640
+ "Can\u2019t": [
641
+ "Ca",
642
+ "n\u2019t"
643
+ ],
644
+ "DON'T": [
645
+ "DO",
646
+ "N'T"
647
+ ],
648
+ "DONT": [
649
+ "DO",
650
+ "NT"
651
+ ],
652
+ "Don't": [
653
+ "Do",
654
+ "n't"
655
+ ],
656
+ "Dont": [
657
+ "Do",
658
+ "nt"
659
+ ],
660
+ "Don\u2019t": [
661
+ "Do",
662
+ "n\u2019t"
663
+ ],
664
+ "Haven't": [
665
+ "Have",
666
+ "n't"
667
+ ],
668
+ "Isn't": [
669
+ "Is",
670
+ "n't"
671
+ ],
672
+ "Isn\u2019t": [
673
+ "Is",
674
+ "n\u2019t"
675
+ ],
676
+ "Won't": [
677
+ "Wo",
678
+ "n't"
679
+ ],
680
+ "ain't": [
681
+ "ai",
682
+ "n't"
683
+ ],
684
+ "aint": [
685
+ "ai",
686
+ "nt"
687
+ ],
688
+ "aren't": [
689
+ "are",
690
+ "not"
691
+ ],
692
+ "arent": [
693
+ "are",
694
+ "nt"
695
+ ],
696
+ "aren\u2019t": [
697
+ "are",
698
+ "n\u2019t"
699
+ ],
700
+ "can't": [
701
+ "can",
702
+ "not"
703
+ ],
704
+ "cannot": [
705
+ "can",
706
+ "not"
707
+ ],
708
+ "cant": [
709
+ "ca",
710
+ "nt"
711
+ ],
712
+ "can\u2019t": [
713
+ "ca",
714
+ "n\u2019t"
715
+ ],
716
+ "didn't": [
717
+ "did",
718
+ "n't"
719
+ ],
720
+ "didn\u2019t": [
721
+ "did",
722
+ "n\u2019t"
723
+ ],
724
+ "doesn't": [
725
+ "does",
726
+ "n't"
727
+ ],
728
+ "doesn\u2019t": [
729
+ "does",
730
+ "n\u2019t"
731
+ ],
732
+ "don't": [
733
+ "do",
734
+ "not"
735
+ ],
736
+ "dont": [
737
+ "do",
738
+ "nt"
739
+ ],
740
+ "don\u2019t": [
741
+ "do",
742
+ "n\u2019t"
743
+ ],
744
+ "haven't": [
745
+ "have",
746
+ "n't"
747
+ ],
748
+ "shan't": [
749
+ "sha",
750
+ "n't"
751
+ ],
752
+ "shouldn't": [
753
+ "should",
754
+ "not"
755
+ ],
756
+ "wasent": [
757
+ "wase",
758
+ "nt"
759
+ ],
760
+ "weren't": [
761
+ "were",
762
+ "n't"
763
+ ],
764
+ "weren\u2019t": [
765
+ "were",
766
+ "n\u2019t"
767
+ ],
768
+ "won't": [
769
+ "will",
770
+ "not"
771
+ ],
772
+ "wont": [
773
+ "wo",
774
+ "nt"
775
+ ],
776
+ "won\u2019t": [
777
+ "wo",
778
+ "n\u2019t"
779
+ ]
780
+ },
781
+ "AUX+PART+VERB": {
782
+ "dunno": [
783
+ "du",
784
+ "n",
785
+ "no"
786
+ ]
787
+ },
788
+ "AUX+VERB": {
789
+ "beingsaid": [
790
+ "being",
791
+ "said"
792
+ ],
793
+ "beingsent": [
794
+ "being",
795
+ "sent"
796
+ ],
797
+ "beingshipped": [
798
+ "being",
799
+ "shipped"
800
+ ],
801
+ "beingspoken": [
802
+ "being",
803
+ "spoken"
804
+ ],
805
+ "havingsaid": [
806
+ "having",
807
+ "said"
808
+ ]
809
+ },
810
+ "DET+AUX": {
811
+ "thes": [
812
+ "the",
813
+ "s"
814
+ ]
815
+ },
816
+ "DET+NOUN": {
817
+ "ALOT": [
818
+ "A",
819
+ "LOT"
820
+ ],
821
+ "Alot": [
822
+ "A",
823
+ "lot"
824
+ ],
825
+ "apart": [
826
+ "a",
827
+ "part"
828
+ ],
829
+ "awhile": [
830
+ "a",
831
+ "while"
832
+ ],
833
+ "sometime": [
834
+ "some",
835
+ "time"
836
+ ]
837
+ },
838
+ "DET+NUM": {
839
+ "everyone": [
840
+ "every",
841
+ "one"
842
+ ]
843
+ },
844
+ "INTJ+PUNCT": {
845
+ "etc.'": [
846
+ "etc.",
847
+ "'"
848
+ ],
849
+ "ta',": [
850
+ "ta'",
851
+ ","
852
+ ]
853
+ },
854
+ "NOUN+ADJ": {
855
+ "nothingspecial": [
856
+ "nothing",
857
+ "special"
858
+ ]
859
+ },
860
+ "NOUN+ADP": {
861
+ "flagon": [
862
+ "flag",
863
+ "on"
864
+ ],
865
+ "groundsof": [
866
+ "grounds",
867
+ "of"
868
+ ],
869
+ "hashtagon": [
870
+ "hashtag",
871
+ "on"
872
+ ],
873
+ "meetingsince": [
874
+ "meeting",
875
+ "since"
876
+ ]
877
+ },
878
+ "NOUN+AUX": {
879
+ "breathingshould": [
880
+ "breathing",
881
+ "should"
882
+ ],
883
+ "doghas": [
884
+ "dog",
885
+ "has"
886
+ ],
887
+ "mythmakingshould": [
888
+ "mythmaking",
889
+ "should"
890
+ ]
891
+ },
892
+ "NOUN+NOUN": {
893
+ "Drivingschool": [
894
+ "Driving",
895
+ "school"
896
+ ],
897
+ "bakingsheet": [
898
+ "baking",
899
+ "sheet"
900
+ ],
901
+ "bakingsoda": [
902
+ "baking",
903
+ "soda"
904
+ ],
905
+ "counselingservices": [
906
+ "counseling",
907
+ "services"
908
+ ],
909
+ "datingservice": [
910
+ "dating",
911
+ "service"
912
+ ],
913
+ "doghouse": [
914
+ "dog",
915
+ "house"
916
+ ],
917
+ "drivingschool": [
918
+ "driving",
919
+ "school"
920
+ ],
921
+ "engineeringservices": [
922
+ "engineering",
923
+ "services"
924
+ ],
925
+ "eveningschedule": [
926
+ "evening",
927
+ "schedule"
928
+ ],
929
+ "kingsnake": [
930
+ "king",
931
+ "snake"
932
+ ],
933
+ "kingsnakes": [
934
+ "king",
935
+ "snakes"
936
+ ],
937
+ "lightingshowroom": [
938
+ "lighting",
939
+ "showroom"
940
+ ],
941
+ "lightingsources": [
942
+ "lighting",
943
+ "sources"
944
+ ],
945
+ "loggingsites": [
946
+ "logging",
947
+ "sites"
948
+ ],
949
+ "mpgnumber": [
950
+ "mpg",
951
+ "number"
952
+ ],
953
+ "plughole": [
954
+ "plug",
955
+ "hole"
956
+ ],
957
+ "runningshorts": [
958
+ "running",
959
+ "shorts"
960
+ ],
961
+ "tagsets": [
962
+ "tag",
963
+ "sets"
964
+ ],
965
+ "testingschedule": [
966
+ "testing",
967
+ "schedule"
968
+ ],
969
+ "towingservices": [
970
+ "towing",
971
+ "services"
972
+ ],
973
+ "trainingsession": [
974
+ "training",
975
+ "session"
976
+ ],
977
+ "writingschedule": [
978
+ "writing",
979
+ "schedule"
980
+ ],
981
+ "writingsystem": [
982
+ "writing",
983
+ "system"
984
+ ]
985
+ },
986
+ "NOUN+NOUN+VERB": {
987
+ "RecruitingMeetingscheduled": [
988
+ "Recruiting",
989
+ "Meeting",
990
+ "scheduled"
991
+ ]
992
+ },
993
+ "NOUN+PART": {
994
+ "DAUGHTERS": [
995
+ "DAUGHTER",
996
+ "S"
997
+ ],
998
+ "Kids": [
999
+ "Kid",
1000
+ "s"
1001
+ ],
1002
+ "Mares": [
1003
+ "Mare",
1004
+ "s"
1005
+ ],
1006
+ "Smokers": [
1007
+ "Smoker",
1008
+ "s"
1009
+ ],
1010
+ "Travelers": [
1011
+ "Traveler",
1012
+ "s"
1013
+ ],
1014
+ "animals": [
1015
+ "animal",
1016
+ "s"
1017
+ ],
1018
+ "bachelors": [
1019
+ "bachelor",
1020
+ "s"
1021
+ ],
1022
+ "bakers": [
1023
+ "baker",
1024
+ "s"
1025
+ ],
1026
+ "beginners": [
1027
+ "beginner",
1028
+ "s"
1029
+ ],
1030
+ "bettas": [
1031
+ "betta",
1032
+ "s"
1033
+ ],
1034
+ "boys": [
1035
+ "boy",
1036
+ "s"
1037
+ ],
1038
+ "cars": [
1039
+ "car",
1040
+ "s"
1041
+ ],
1042
+ "cats": [
1043
+ "cat",
1044
+ "s"
1045
+ ],
1046
+ "cycles": [
1047
+ "cycle",
1048
+ "s"
1049
+ ],
1050
+ "dads": [
1051
+ "dad",
1052
+ "s"
1053
+ ],
1054
+ "doctors": [
1055
+ "doctor",
1056
+ "s"
1057
+ ],
1058
+ "dogs": [
1059
+ "dog",
1060
+ "s"
1061
+ ],
1062
+ "drivers": [
1063
+ "driver",
1064
+ "s"
1065
+ ],
1066
+ "friends": [
1067
+ "friend",
1068
+ "s"
1069
+ ],
1070
+ "grandmas": [
1071
+ "grandma",
1072
+ "s"
1073
+ ],
1074
+ "horses": [
1075
+ "horse",
1076
+ "s"
1077
+ ],
1078
+ "humans": [
1079
+ "human",
1080
+ "s"
1081
+ ],
1082
+ "males": [
1083
+ "male",
1084
+ "s"
1085
+ ],
1086
+ "manufacturers": [
1087
+ "manufacturer",
1088
+ "s"
1089
+ ],
1090
+ "mares": [
1091
+ "mare",
1092
+ "s"
1093
+ ],
1094
+ "nights": [
1095
+ "night",
1096
+ "s"
1097
+ ],
1098
+ "owners": [
1099
+ "owner",
1100
+ "s"
1101
+ ],
1102
+ "peoples": [
1103
+ "people",
1104
+ "s"
1105
+ ],
1106
+ "persons": [
1107
+ "person",
1108
+ "s"
1109
+ ],
1110
+ "scammers": [
1111
+ "scammer",
1112
+ "s"
1113
+ ],
1114
+ "sons": [
1115
+ "son",
1116
+ "s"
1117
+ ],
1118
+ "teams": [
1119
+ "team",
1120
+ "s"
1121
+ ],
1122
+ "todays": [
1123
+ "today",
1124
+ "s"
1125
+ ],
1126
+ "trainers": [
1127
+ "trainer",
1128
+ "s"
1129
+ ],
1130
+ "visitors": [
1131
+ "visitor",
1132
+ "s"
1133
+ ],
1134
+ "wits": [
1135
+ "wit",
1136
+ "s"
1137
+ ],
1138
+ "workers": [
1139
+ "worker",
1140
+ "s"
1141
+ ],
1142
+ "years": [
1143
+ "year",
1144
+ "s"
1145
+ ]
1146
+ },
1147
+ "NOUN+PUNCT": {
1148
+ "Ed.:": [
1149
+ "Ed.",
1150
+ ":"
1151
+ ],
1152
+ "Fax.(": [
1153
+ "Fax.",
1154
+ "("
1155
+ ],
1156
+ "a.m.,": [
1157
+ "a.m.",
1158
+ ","
1159
+ ],
1160
+ "lb.,": [
1161
+ "lb.",
1162
+ ","
1163
+ ],
1164
+ "mins.,": [
1165
+ "mins.",
1166
+ ","
1167
+ ],
1168
+ "oz.,": [
1169
+ "oz.",
1170
+ ","
1171
+ ],
1172
+ "p.m.,": [
1173
+ "p.m.",
1174
+ ","
1175
+ ]
1176
+ },
1177
+ "NOUN+SCONJ": {
1178
+ "buildingsince": [
1179
+ "building",
1180
+ "since"
1181
+ ]
1182
+ },
1183
+ "NOUN+VERB": {
1184
+ "dogeat": [
1185
+ "dog",
1186
+ "eat"
1187
+ ],
1188
+ "morningserves": [
1189
+ "morning",
1190
+ "serves"
1191
+ ],
1192
+ "thingsounded": [
1193
+ "thing",
1194
+ "sounded"
1195
+ ]
1196
+ },
1197
+ "PRON+ADJ": {
1198
+ "everythingset": [
1199
+ "everything",
1200
+ "set"
1201
+ ],
1202
+ "somethingsuch": [
1203
+ "something",
1204
+ "such"
1205
+ ]
1206
+ },
1207
+ "PRON+ADV": {
1208
+ "somethingsometime": [
1209
+ "something",
1210
+ "sometime"
1211
+ ]
1212
+ },
1213
+ "PRON+AUX": {
1214
+ "ITS": [
1215
+ "IT",
1216
+ "S"
1217
+ ],
1218
+ "Im": [
1219
+ "I",
1220
+ "m"
1221
+ ],
1222
+ "Its": [
1223
+ "It",
1224
+ "s"
1225
+ ],
1226
+ "Whats": [
1227
+ "What",
1228
+ "s"
1229
+ ],
1230
+ "Your": [
1231
+ "You",
1232
+ "r"
1233
+ ],
1234
+ "hes": [
1235
+ "he",
1236
+ "s"
1237
+ ],
1238
+ "id": [
1239
+ "i",
1240
+ "d"
1241
+ ],
1242
+ "im": [
1243
+ "i",
1244
+ "m"
1245
+ ],
1246
+ "its": [
1247
+ "it",
1248
+ "s"
1249
+ ],
1250
+ "iv": [
1251
+ "i",
1252
+ "v"
1253
+ ],
1254
+ "ive": [
1255
+ "i",
1256
+ "ve"
1257
+ ],
1258
+ "thats": [
1259
+ "that",
1260
+ "s"
1261
+ ],
1262
+ "their": [
1263
+ "thei",
1264
+ "r"
1265
+ ],
1266
+ "there": [
1267
+ "the",
1268
+ "re"
1269
+ ],
1270
+ "ur": [
1271
+ "u",
1272
+ "r"
1273
+ ],
1274
+ "your": [
1275
+ "you",
1276
+ "r"
1277
+ ]
1278
+ },
1279
+ "PRON+NOUN": {
1280
+ "alleconomy": [
1281
+ "all",
1282
+ "economy"
1283
+ ]
1284
+ },
1285
+ "PRON+PART": {
1286
+ "anyones": [
1287
+ "anyone",
1288
+ "s"
1289
+ ]
1290
+ },
1291
+ "PRON+PRON": {
1292
+ "everythingshe": [
1293
+ "everything",
1294
+ "she"
1295
+ ]
1296
+ },
1297
+ "PRON+VERB": {
1298
+ "Thats": [
1299
+ "That",
1300
+ "s"
1301
+ ],
1302
+ "Theres": [
1303
+ "There",
1304
+ "s"
1305
+ ],
1306
+ "everythingset": [
1307
+ "everything",
1308
+ "set"
1309
+ ],
1310
+ "iguz": [
1311
+ "i",
1312
+ "guz"
1313
+ ],
1314
+ "im": [
1315
+ "i",
1316
+ "m"
1317
+ ],
1318
+ "its": [
1319
+ "it",
1320
+ "s"
1321
+ ],
1322
+ "theres": [
1323
+ "there",
1324
+ "s"
1325
+ ],
1326
+ "youthank": [
1327
+ "you",
1328
+ "thank"
1329
+ ]
1330
+ },
1331
+ "PROPN+ADP": {
1332
+ "Pagin": [
1333
+ "Pag",
1334
+ "in"
1335
+ ],
1336
+ "Petersburgin": [
1337
+ "Petersburg",
1338
+ "in"
1339
+ ]
1340
+ },
1341
+ "PROPN+AUX": {
1342
+ "Hedwighad": [
1343
+ "Hedwig",
1344
+ "had"
1345
+ ]
1346
+ },
1347
+ "PROPN+PART": {
1348
+ "BJs": [
1349
+ "BJ",
1350
+ "s"
1351
+ ],
1352
+ "Chilis": [
1353
+ "Chili",
1354
+ "s"
1355
+ ],
1356
+ "Friscos": [
1357
+ "Frisco",
1358
+ "s"
1359
+ ],
1360
+ "Hams": [
1361
+ "Ham",
1362
+ "s"
1363
+ ],
1364
+ "Kobeys": [
1365
+ "Kobey",
1366
+ "s"
1367
+ ],
1368
+ "LWs": [
1369
+ "LW",
1370
+ "s"
1371
+ ],
1372
+ "Leonardos": [
1373
+ "Leonardo",
1374
+ "s"
1375
+ ],
1376
+ "Mortons": [
1377
+ "Morton",
1378
+ "s"
1379
+ ],
1380
+ "Travellers": [
1381
+ "Traveller",
1382
+ "s"
1383
+ ],
1384
+ "Valentines": [
1385
+ "Valentine",
1386
+ "s"
1387
+ ],
1388
+ "Years": [
1389
+ "Year",
1390
+ "s"
1391
+ ],
1392
+ "jacks": [
1393
+ "jack",
1394
+ "s"
1395
+ ]
1396
+ },
1397
+ "PROPN+PROPN": {
1398
+ "G&GAutomotive": [
1399
+ "G&G",
1400
+ "Automotive"
1401
+ ],
1402
+ "drivingschool": [
1403
+ "driving",
1404
+ "school"
1405
+ ]
1406
+ },
1407
+ "PROPN+PUNCT": {
1408
+ "B.,": [
1409
+ "B.",
1410
+ ","
1411
+ ],
1412
+ "B.A.\"": [
1413
+ "B.A.",
1414
+ "\""
1415
+ ],
1416
+ "D.C.,": [
1417
+ "D.C.",
1418
+ ","
1419
+ ],
1420
+ "Inc.\"": [
1421
+ "Inc.",
1422
+ "\""
1423
+ ],
1424
+ "M.,": [
1425
+ "M.",
1426
+ ","
1427
+ ],
1428
+ "N.O.?": [
1429
+ "N.O.",
1430
+ "?"
1431
+ ],
1432
+ "Que.,": [
1433
+ "Que.",
1434
+ ","
1435
+ ],
1436
+ "U.N.,": [
1437
+ "U.N.",
1438
+ ","
1439
+ ],
1440
+ "U.S.)": [
1441
+ "U.S.",
1442
+ ")"
1443
+ ],
1444
+ "U.S.-": [
1445
+ "U.S.",
1446
+ "-"
1447
+ ],
1448
+ "Va.-": [
1449
+ "Va.",
1450
+ "-"
1451
+ ]
1452
+ },
1453
+ "PROPN+PUNCT+PUNCT": {
1454
+ "W.H.\",": [
1455
+ "W.H.",
1456
+ "\"",
1457
+ ","
1458
+ ]
1459
+ },
1460
+ "PROPN+VERB": {
1461
+ "Orglive": [
1462
+ "Org",
1463
+ "live"
1464
+ ],
1465
+ "Pagyelped": [
1466
+ "Pag",
1467
+ "yelped"
1468
+ ]
1469
+ },
1470
+ "PUNCT+PUNCT": {
1471
+ "!\"": [
1472
+ "!",
1473
+ "\""
1474
+ ],
1475
+ "!'": [
1476
+ "!",
1477
+ "'"
1478
+ ],
1479
+ "!)": [
1480
+ "!",
1481
+ ")"
1482
+ ],
1483
+ "\"!": [
1484
+ "\"",
1485
+ "!"
1486
+ ],
1487
+ "\"\"": [
1488
+ "\"",
1489
+ "\""
1490
+ ],
1491
+ "\"(": [
1492
+ "\"",
1493
+ "("
1494
+ ],
1495
+ "\")": [
1496
+ "\"",
1497
+ ")"
1498
+ ],
1499
+ "\",": [
1500
+ "\"",
1501
+ ","
1502
+ ],
1503
+ "\"-": [
1504
+ "\"",
1505
+ "-"
1506
+ ],
1507
+ "\".": [
1508
+ "\"",
1509
+ "."
1510
+ ],
1511
+ "\"...": [
1512
+ "\"",
1513
+ "..."
1514
+ ],
1515
+ "\":": [
1516
+ "\"",
1517
+ ":"
1518
+ ],
1519
+ "\"[": [
1520
+ "\"",
1521
+ "["
1522
+ ],
1523
+ "')": [
1524
+ "'",
1525
+ ")"
1526
+ ],
1527
+ "',": [
1528
+ "'",
1529
+ ","
1530
+ ],
1531
+ "(\"": [
1532
+ "(",
1533
+ "\""
1534
+ ],
1535
+ "(\"\"": [
1536
+ "(",
1537
+ "\"\""
1538
+ ],
1539
+ "('": [
1540
+ "(",
1541
+ "'"
1542
+ ],
1543
+ "((": [
1544
+ "(",
1545
+ "("
1546
+ ],
1547
+ "([": [
1548
+ "(",
1549
+ "["
1550
+ ],
1551
+ ")\"": [
1552
+ ")",
1553
+ "\""
1554
+ ],
1555
+ ")(": [
1556
+ ")",
1557
+ "("
1558
+ ],
1559
+ "))": [
1560
+ ")",
1561
+ ")"
1562
+ ],
1563
+ "),": [
1564
+ ")",
1565
+ ","
1566
+ ],
1567
+ ").": [
1568
+ ")",
1569
+ "."
1570
+ ],
1571
+ ")...": [
1572
+ ")",
1573
+ "..."
1574
+ ],
1575
+ "):": [
1576
+ ")",
1577
+ ":"
1578
+ ],
1579
+ ");": [
1580
+ ")",
1581
+ ";"
1582
+ ],
1583
+ "*,": [
1584
+ "*",
1585
+ ","
1586
+ ],
1587
+ ",\"": [
1588
+ ",",
1589
+ "\""
1590
+ ],
1591
+ ",'": [
1592
+ ",",
1593
+ "'"
1594
+ ],
1595
+ ",''": [
1596
+ ",",
1597
+ "''"
1598
+ ],
1599
+ ",...": [
1600
+ ",",
1601
+ "..."
1602
+ ],
1603
+ "-\"": [
1604
+ "-",
1605
+ "\""
1606
+ ],
1607
+ ".\"": [
1608
+ ".",
1609
+ "\""
1610
+ ],
1611
+ ".'": [
1612
+ ".",
1613
+ "'"
1614
+ ],
1615
+ "..": [
1616
+ ".",
1617
+ "."
1618
+ ],
1619
+ "...\"": [
1620
+ "...",
1621
+ "\""
1622
+ ],
1623
+ "....": [
1624
+ "...",
1625
+ "."
1626
+ ],
1627
+ "?\"": [
1628
+ "?",
1629
+ "\""
1630
+ ],
1631
+ "?'": [
1632
+ "?",
1633
+ "'"
1634
+ ],
1635
+ "?)": [
1636
+ "?",
1637
+ ")"
1638
+ ],
1639
+ "?]": [
1640
+ "?",
1641
+ "]"
1642
+ ],
1643
+ "],": [
1644
+ "]",
1645
+ ","
1646
+ ],
1647
+ "];": [
1648
+ "]",
1649
+ ";"
1650
+ ]
1651
+ },
1652
+ "PUNCT+PUNCT+PUNCT": {
1653
+ "!),": [
1654
+ "!",
1655
+ ")",
1656
+ ","
1657
+ ],
1658
+ "\"),": [
1659
+ "\"",
1660
+ ")",
1661
+ ","
1662
+ ],
1663
+ "?),": [
1664
+ "?",
1665
+ ")",
1666
+ ","
1667
+ ],
1668
+ "]),": [
1669
+ "]",
1670
+ ")",
1671
+ ","
1672
+ ]
1673
+ },
1674
+ "PUNCT+SYM": {
1675
+ "($": [
1676
+ "(",
1677
+ "$"
1678
+ ]
1679
+ },
1680
+ "PUNCT+SYM+PUNCT": {
1681
+ "(%)": [
1682
+ "(",
1683
+ "%",
1684
+ ")"
1685
+ ]
1686
+ },
1687
+ "SYM+PUNCT": {
1688
+ "$,": [
1689
+ "$",
1690
+ ","
1691
+ ],
1692
+ "%)": [
1693
+ "%",
1694
+ ")"
1695
+ ],
1696
+ "%,": [
1697
+ "%",
1698
+ ","
1699
+ ],
1700
+ "-'": [
1701
+ "-",
1702
+ "'"
1703
+ ]
1704
+ },
1705
+ "SYM+SYM": {
1706
+ "-$": [
1707
+ "-",
1708
+ "$"
1709
+ ]
1710
+ },
1711
+ "VERB+ADJ": {
1712
+ "alteringspecific": [
1713
+ "altering",
1714
+ "specific"
1715
+ ],
1716
+ "doingshoddy": [
1717
+ "doing",
1718
+ "shoddy"
1719
+ ],
1720
+ "facingserious": [
1721
+ "facing",
1722
+ "serious"
1723
+ ],
1724
+ "legalizingsame": [
1725
+ "legalizing",
1726
+ "same"
1727
+ ],
1728
+ "mixinguppercase": [
1729
+ "mixing",
1730
+ "uppercase"
1731
+ ],
1732
+ "motivatingsyntactic": [
1733
+ "motivating",
1734
+ "syntactic"
1735
+ ],
1736
+ "outsourcingspecial": [
1737
+ "outsourcing",
1738
+ "special"
1739
+ ],
1740
+ "reinforcingsimilar": [
1741
+ "reinforcing",
1742
+ "similar"
1743
+ ],
1744
+ "showingsuperb": [
1745
+ "showing",
1746
+ "superb"
1747
+ ],
1748
+ "usingsimple": [
1749
+ "using",
1750
+ "simple"
1751
+ ]
1752
+ },
1753
+ "VERB+ADJ+CCONJ": {
1754
+ "lookingsmugand": [
1755
+ "looking",
1756
+ "smug",
1757
+ "and"
1758
+ ]
1759
+ },
1760
+ "VERB+ADP": {
1761
+ "Login": [
1762
+ "Log",
1763
+ "in"
1764
+ ],
1765
+ "gamingsince": [
1766
+ "gaming",
1767
+ "since"
1768
+ ],
1769
+ "goto": [
1770
+ "go",
1771
+ "to"
1772
+ ],
1773
+ "hummingsince": [
1774
+ "humming",
1775
+ "since"
1776
+ ],
1777
+ "investigatingsince": [
1778
+ "investigating",
1779
+ "since"
1780
+ ],
1781
+ "login": [
1782
+ "log",
1783
+ "in"
1784
+ ],
1785
+ "setup": [
1786
+ "set",
1787
+ "up"
1788
+ ]
1789
+ },
1790
+ "VERB+ADV": {
1791
+ "advancingslowly": [
1792
+ "advancing",
1793
+ "slowly"
1794
+ ],
1795
+ "behavingsplendidly": [
1796
+ "behaving",
1797
+ "splendidly"
1798
+ ],
1799
+ "bucklingslightly": [
1800
+ "buckling",
1801
+ "slightly"
1802
+ ],
1803
+ "contributingsubstantially": [
1804
+ "contributing",
1805
+ "substantially"
1806
+ ],
1807
+ "exultingeverywhere": [
1808
+ "exulting",
1809
+ "everywhere"
1810
+ ],
1811
+ "includingspecifically": [
1812
+ "including",
1813
+ "specifically"
1814
+ ],
1815
+ "movingsouthward": [
1816
+ "moving",
1817
+ "southward"
1818
+ ],
1819
+ "proposingspecifically": [
1820
+ "proposing",
1821
+ "specifically"
1822
+ ],
1823
+ "scavengingseriously": [
1824
+ "scavenging",
1825
+ "seriously"
1826
+ ],
1827
+ "swellingslightly": [
1828
+ "swelling",
1829
+ "slightly"
1830
+ ],
1831
+ "totalingsomewhere": [
1832
+ "totaling",
1833
+ "somewhere"
1834
+ ],
1835
+ "walkinguptown": [
1836
+ "walking",
1837
+ "uptown"
1838
+ ]
1839
+ },
1840
+ "VERB+ADV+PUNCT": {
1841
+ "studyinge.g.,": [
1842
+ "studying",
1843
+ "e.g.",
1844
+ ","
1845
+ ]
1846
+ },
1847
+ "VERB+AUX": {
1848
+ "Winningshall": [
1849
+ "Winning",
1850
+ "shall"
1851
+ ],
1852
+ "copyingshould": [
1853
+ "copying",
1854
+ "should"
1855
+ ]
1856
+ },
1857
+ "VERB+CCONJ": {
1858
+ "departingeither": [
1859
+ "departing",
1860
+ "either"
1861
+ ]
1862
+ },
1863
+ "VERB+DET": {
1864
+ "basingsome": [
1865
+ "basing",
1866
+ "some"
1867
+ ],
1868
+ "demonstratingsuch": [
1869
+ "demonstrating",
1870
+ "such"
1871
+ ],
1872
+ "discussingsome": [
1873
+ "discussing",
1874
+ "some"
1875
+ ],
1876
+ "doingevery": [
1877
+ "doing",
1878
+ "every"
1879
+ ],
1880
+ "doingsome": [
1881
+ "doing",
1882
+ "some"
1883
+ ],
1884
+ "dumpingsome": [
1885
+ "dumping",
1886
+ "some"
1887
+ ],
1888
+ "experiencingsome": [
1889
+ "experiencing",
1890
+ "some"
1891
+ ],
1892
+ "finishingsome": [
1893
+ "finishing",
1894
+ "some"
1895
+ ],
1896
+ "hostingsome": [
1897
+ "hosting",
1898
+ "some"
1899
+ ],
1900
+ "meetingeach": [
1901
+ "meeting",
1902
+ "each"
1903
+ ],
1904
+ "playingsome": [
1905
+ "playing",
1906
+ "some"
1907
+ ],
1908
+ "rangeach": [
1909
+ "rang",
1910
+ "each"
1911
+ ],
1912
+ "readingsome": [
1913
+ "reading",
1914
+ "some"
1915
+ ],
1916
+ "regardingsome": [
1917
+ "regarding",
1918
+ "some"
1919
+ ],
1920
+ "replacingsome": [
1921
+ "replacing",
1922
+ "some"
1923
+ ],
1924
+ "spendingsome": [
1925
+ "spending",
1926
+ "some"
1927
+ ],
1928
+ "usingsome": [
1929
+ "using",
1930
+ "some"
1931
+ ]
1932
+ },
1933
+ "VERB+NOUN": {
1934
+ "continuingsource": [
1935
+ "continuing",
1936
+ "source"
1937
+ ],
1938
+ "differingschedules": [
1939
+ "differing",
1940
+ "schedules"
1941
+ ],
1942
+ "doingscissors": [
1943
+ "doing",
1944
+ "scissors"
1945
+ ],
1946
+ "expandingsystem": [
1947
+ "expanding",
1948
+ "system"
1949
+ ],
1950
+ "expressingsadness": [
1951
+ "expressing",
1952
+ "sadness"
1953
+ ],
1954
+ "followingsuggestion": [
1955
+ "following",
1956
+ "suggestion"
1957
+ ],
1958
+ "formingeggs": [
1959
+ "forming",
1960
+ "eggs"
1961
+ ],
1962
+ "gettingsavage": [
1963
+ "getting",
1964
+ "savage"
1965
+ ],
1966
+ "gleamingsand": [
1967
+ "gleaming",
1968
+ "sand"
1969
+ ],
1970
+ "improvingsurveillance": [
1971
+ "improving",
1972
+ "surveillance"
1973
+ ],
1974
+ "meaningshell": [
1975
+ "meaning",
1976
+ "shell"
1977
+ ],
1978
+ "playingsports": [
1979
+ "playing",
1980
+ "sports"
1981
+ ],
1982
+ "printingerrors": [
1983
+ "printing",
1984
+ "errors"
1985
+ ],
1986
+ "producingshrubs": [
1987
+ "producing",
1988
+ "shrubs"
1989
+ ],
1990
+ "providingservices": [
1991
+ "providing",
1992
+ "services"
1993
+ ],
1994
+ "quittingsmoking": [
1995
+ "quitting",
1996
+ "smoking"
1997
+ ],
1998
+ "rushingslipstream": [
1999
+ "rushing",
2000
+ "slipstream"
2001
+ ],
2002
+ "seeingsomeone": [
2003
+ "seeing",
2004
+ "someone"
2005
+ ],
2006
+ "studyingsymmetry": [
2007
+ "studying",
2008
+ "symmetry"
2009
+ ]
2010
+ },
2011
+ "VERB+PART": {
2012
+ "Gonna": [
2013
+ "Gon",
2014
+ "na"
2015
+ ],
2016
+ "Gotta": [
2017
+ "Got",
2018
+ "ta"
2019
+ ],
2020
+ "aren't": [
2021
+ "are",
2022
+ "n't"
2023
+ ],
2024
+ "didn't": [
2025
+ "did",
2026
+ "n't"
2027
+ ],
2028
+ "doesn't": [
2029
+ "does",
2030
+ "n't"
2031
+ ],
2032
+ "don't": [
2033
+ "do",
2034
+ "n't"
2035
+ ],
2036
+ "don\u2019t": [
2037
+ "do",
2038
+ "n\u2019t"
2039
+ ],
2040
+ "gonna": [
2041
+ "gon",
2042
+ "na"
2043
+ ],
2044
+ "gotta": [
2045
+ "got",
2046
+ "ta"
2047
+ ],
2048
+ "haven't": [
2049
+ "have",
2050
+ "n't"
2051
+ ],
2052
+ "wana": [
2053
+ "wan",
2054
+ "a"
2055
+ ],
2056
+ "wanna": [
2057
+ "wan",
2058
+ "na"
2059
+ ]
2060
+ },
2061
+ "VERB+PRON": {
2062
+ "Lets": [
2063
+ "Let",
2064
+ "s"
2065
+ ],
2066
+ "callyou": [
2067
+ "call",
2068
+ "you"
2069
+ ],
2070
+ "crossingeach": [
2071
+ "crossing",
2072
+ "each"
2073
+ ],
2074
+ "doingeverything": [
2075
+ "doing",
2076
+ "everything"
2077
+ ],
2078
+ "expectingsomeone": [
2079
+ "expecting",
2080
+ "someone"
2081
+ ],
2082
+ "lets": [
2083
+ "let",
2084
+ "s"
2085
+ ],
2086
+ "slunghis": [
2087
+ "slung",
2088
+ "his"
2089
+ ]
2090
+ },
2091
+ "VERB+PRON+ADP": {
2092
+ "seeingeverythingaround": [
2093
+ "seeing",
2094
+ "everything",
2095
+ "around"
2096
+ ]
2097
+ },
2098
+ "VERB+PRON+ADV": {
2099
+ "screwingeverythingup": [
2100
+ "screwing",
2101
+ "everything",
2102
+ "up"
2103
+ ]
2104
+ },
2105
+ "VERB+PROPN": {
2106
+ "arrivingsalt": [
2107
+ "arriving",
2108
+ "salt"
2109
+ ],
2110
+ "departingsan": [
2111
+ "departing",
2112
+ "san"
2113
+ ],
2114
+ "leavingsan": [
2115
+ "leaving",
2116
+ "san"
2117
+ ],
2118
+ "leavingsunday": [
2119
+ "leaving",
2120
+ "sunday"
2121
+ ]
2122
+ },
2123
+ "VERB+SCONJ": {
2124
+ "decidewhether": [
2125
+ "decide",
2126
+ "whether"
2127
+ ]
2128
+ },
2129
+ "VERB+VERB": {
2130
+ "growingsuspended": [
2131
+ "growing",
2132
+ "suspended"
2133
+ ],
2134
+ "had": [
2135
+ "h",
2136
+ "ad"
2137
+ ]
2138
+ },
2139
+ "VERB+VERB+NOUN": {
2140
+ "crushingsleepingflowers": [
2141
+ "crushing",
2142
+ "sleeping",
2143
+ "flowers"
2144
+ ],
2145
+ "hostingvisitingschool": [
2146
+ "hosting",
2147
+ "visiting",
2148
+ "school"
2149
+ ]
2150
+ },
2151
+ "X+PUNCT": {
2152
+ "al.,": [
2153
+ "al.",
2154
+ ","
2155
+ ],
2156
+ "e.g.,": [
2157
+ "e.g.",
2158
+ ","
2159
+ ],
2160
+ "etc.)": [
2161
+ "etc.",
2162
+ ")"
2163
+ ],
2164
+ "etc.,": [
2165
+ "etc.",
2166
+ ","
2167
+ ],
2168
+ "etc..": [
2169
+ "etc.",
2170
+ "."
2171
+ ]
2172
+ },
2173
+ "X+X": {
2174
+ "'s": [
2175
+ "'",
2176
+ "s"
2177
+ ],
2178
+ ").doc": [
2179
+ ")",
2180
+ ".doc"
2181
+ ]
2182
+ },
2183
+ "X+X+PRON": {
2184
+ "http://i.imgur.com/T2zff.jpghttp://i.imgur.com/Xytex.jpgI": [
2185
+ "http://i.imgur.com/T2zff.jpg",
2186
+ "http://i.imgur.com/Xytex.jpg",
2187
+ "I"
2188
+ ]
2189
+ }
2190
+ }
2191
+ },
2192
+ "tokenizer_class": "RobertaTokenizer",
2193
+ "torch_dtype": "float32",
2194
+ "transformers_version": "4.14.1",
2195
+ "type_vocab_size": 1,
2196
+ "use_cache": true,
2197
+ "vocab_size": 50265
2198
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8041215de9d42b13ee725e09a9f290ac9cb468b62f59c9ca141d97c999a6b28
3
+ size 1418256753
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "sep_token": "</s>", "pad_token": "<pad>", "cls_token": "<s>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": false}}
supar.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19c0e45bcb6995ffa2b1a70117156f6f1e135f93d355d60c540b0c041849de72
3
+ size 1474825829
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "add_prefix_space": false, "errors": "replace", "sep_token": "</s>", "cls_token": "<s>", "pad_token": "<pad>", "mask_token": "<mask>", "model_max_length": 512, "tokenizer_class": "RobertaTokenizer"}
vocab.json ADDED
The diff for this file is too large to render. See raw diff