thefrigidliquidation commited on
Commit
347a529
1 Parent(s): 9f97b8a

Further trained model

Browse files
Files changed (6) hide show
  1. README.md +45 -1
  2. added_tokens.json +1039 -0
  3. config.json +3 -3
  4. pytorch_model.bin +2 -2
  5. tokenizer.json +2 -2
  6. tokenizer_config.json +2 -2
README.md CHANGED
@@ -11,4 +11,48 @@ license: cc-by-nc-4.0
11
 
12
  This model was fine-tuned on light and web novel for Japanese to English translation.
13
 
14
- Can translate sentences and paragraphs not exceeding 128 tokens.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  This model was fine-tuned on light and web novel for Japanese to English translation.
13
 
14
+ It can translate sentences and paragraphs up to 512 tokens.
15
+
16
+
17
+ ## Usage
18
+ ```python
19
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
20
+
21
+ tokenizer = AutoTokenizer.from_pretrained("thefrigidliquidation/nllb-jaen-1.3B-lightnovels")
22
+ model = AutoModelForSeq2SeqLM.from_pretrained("thefrigidliquidation/nllb-jaen-1.3B-lightnovels")
23
+
24
+ generated_tokens = model.generate(
25
+ **inputs,
26
+ forced_bos_token_id=tokenizer.lang_code_to_id[tokenizer.tgt_lang],
27
+ max_new_tokens=1024,
28
+ no_repeat_ngram_size=6,
29
+ ).cpu()
30
+
31
+ translated_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
32
+ ```
33
+
34
+ Generating with diverse beam search seems to work best. Add the following to `model.generate`:
35
+ ```python
36
+ num_beams=8,
37
+ num_beam_groups=4,
38
+ do_sample=False,
39
+ ```
40
+
41
+
42
+ ## Glossary
43
+ You can provide up to 10 custom translations for nouns and character names at runtime. To do so, surround the Japanese term with term tokens. Prefix the word with one of <t0>, <t1>, ..., <t9> and suffix the word with </t>. The term will be translated as the prefix term token which can then be string replaced.
44
+
45
+ For example, in "マイン、ルッツが迎えに来たよ" if you wish to have "マイン" translated as "Myne" you would replace "マイン" with "<t0>マイン</t>". The model will translate "<t0>マイン</t>、ルッツが迎えに来たよ" as "<t0>, Lutz is here to pick you up." Then simply do a string replacement on the output, replacing "<t0>" with "Myne".
46
+
47
+
48
+ ## Honorifics
49
+ You can force the model to generate or ignore honorifics.
50
+
51
+ ```python
52
+ # default, the model decides whether to use honorifics
53
+ tokenizer.tgt_lang = "jpn_Jpan"
54
+ # no honorifics, the model is discouraged from using honorifics
55
+ tokenizer.tgt_lang = "zsm_Latn"
56
+ # honorifics, the model is encouraged to use honorifics
57
+ tokenizer.tgt_lang = "zul_Latn"
58
+ ```
added_tokens.json ADDED
@@ -0,0 +1,1039 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</fs>": 257229,
3
+ "</t>": 257230,
4
+ "<fs>": 257228,
5
+ "<t0>": 257231,
6
+ "<t1>": 257232,
7
+ "<t2>": 257233,
8
+ "<t3>": 257234,
9
+ "<t4>": 257235,
10
+ "<t5>": 257236,
11
+ "<t6>": 257237,
12
+ "<t7>": 257238,
13
+ "<t8>": 257239,
14
+ "<t9>": 257240,
15
+ "⦅": 257198,
16
+ "〆": 256692,
17
+ "〒": 256881,
18
+ "〘": 257139,
19
+ "そ嚼": 257013,
20
+ "ひ翠": 257194,
21
+ "ゎ": 257125,
22
+ "゠": 257211,
23
+ "ヲ": 256383,
24
+ "丙": 257136,
25
+ "丞": 256747,
26
+ "丼": 256438,
27
+ "乖": 256767,
28
+ "仄": 256379,
29
+ "伎": 256723,
30
+ "伽": 256564,
31
+ "伽噺": 256717,
32
+ "佃": 257227,
33
+ "佇": 256265,
34
+ "侘": 256894,
35
+ "侭": 256715,
36
+ "俎": 257142,
37
+ "俟": 257014,
38
+ "俵": 256648,
39
+ "俸": 256990,
40
+ "倅": 257017,
41
+ "倖": 256693,
42
+ "倶": 256398,
43
+ "倹": 256980,
44
+ "偃": 257212,
45
+ "偲": 256955,
46
+ "傀": 256739,
47
+ "傀儡": 256489,
48
+ "傅": 256891,
49
+ "傻": 257204,
50
+ "僥": 256700,
51
+ "僥倖": 256760,
52
+ "僭": 256628,
53
+ "僻": 256709,
54
+ "儂": 256215,
55
+ "儚": 256309,
56
+ "儡": 256745,
57
+ "兎": 256236,
58
+ "兎鎧": 257202,
59
+ "兜": 256248,
60
+ "兜蟹": 257115,
61
+ "冑": 256333,
62
+ "冨": 256565,
63
+ "冴": 256267,
64
+ "冶": 256231,
65
+ "冽": 256927,
66
+ "凛": 256227,
67
+ "凜": 256338,
68
+ "凧": 257034,
69
+ "凪": 256335,
70
+ "凹": 256369,
71
+ "刎": 256512,
72
+ "刮": 257032,
73
+ "刹": 256272,
74
+ "剋": 256802,
75
+ "剌": 256913,
76
+ "剝": 256233,
77
+ "劈": 256979,
78
+ "劾": 256844,
79
+ "勅": 256598,
80
+ "匡": 256241,
81
+ "卍": 257149,
82
+ "卯": 256678,
83
+ "厠": 257065,
84
+ "厩": 256538,
85
+ "厰": 256938,
86
+ "叢": 256813,
87
+ "叭": 257173,
88
+ "吶": 256817,
89
+ "吼": 256368,
90
+ "吽": 256964,
91
+ "呃": 257177,
92
+ "呉": 256790,
93
+ "呐": 257216,
94
+ "呗": 256873,
95
+ "呟": 256205,
96
+ "呵": 256513,
97
+ "呷": 256469,
98
+ "呻": 256230,
99
+ "咀": 256511,
100
+ "咀嚼": 256544,
101
+ "咄": 256282,
102
+ "咄嗟": 256264,
103
+ "咆": 256337,
104
+ "咆吼": 257049,
105
+ "咆哮": 256294,
106
+ "咢": 257188,
107
+ "咤": 256495,
108
+ "咥": 256366,
109
+ "咦": 257106,
110
+ "咫": 256968,
111
+ "咱": 256641,
112
+ "咽": 256410,
113
+ "哂": 257051,
114
+ "哄": 256503,
115
+ "哮": 256348,
116
+ "唖": 256298,
117
+ "啄": 257150,
118
+ "啖": 256721,
119
+ "啖呵": 257036,
120
+ "啜": 256389,
121
+ "啞": 256382,
122
+ "啼": 256869,
123
+ "喔": 256849,
124
+ "嗄": 256792,
125
+ "嗇": 257196,
126
+ "嗚": 256399,
127
+ "嗚咽": 256560,
128
+ "嗟": 256257,
129
+ "嗤": 256320,
130
+ "嗯": 256753,
131
+ "嗾": 257214,
132
+ "嘯": 256652,
133
+ "嘶": 256707,
134
+ "噌": 256319,
135
+ "噎": 256899,
136
+ "噓": 256212,
137
+ "噤": 256363,
138
+ "噺": 256764,
139
+ "嚙": 256219,
140
+ "嚢": 256708,
141
+ "嚥": 256801,
142
+ "嚼": 256579,
143
+ "囀": 256827,
144
+ "囂": 256996,
145
+ "囃": 256694,
146
+ "囓": 256662,
147
+ "囮": 256313,
148
+ "坤": 257080,
149
+ "坩堝": 256886,
150
+ "埒": 256372,
151
+ "埜": 256246,
152
+ "埠": 256908,
153
+ "埴": 256521,
154
+ "埼": 256575,
155
+ "堀": 256214,
156
+ "堰": 256591,
157
+ "堺": 256779,
158
+ "塀": 256345,
159
+ "塁": 256719,
160
+ "塒": 256972,
161
+ "塡": 256549,
162
+ "塹": 257071,
163
+ "塹壕": 256758,
164
+ "墾": 256729,
165
+ "壕": 256763,
166
+ "壜": 256966,
167
+ "壬": 256228,
168
+ "壱": 257183,
169
+ "壷": 256548,
170
+ "壺": 256270,
171
+ "妳": 256623,
172
+ "妾": 256240,
173
+ "姑": 256547,
174
+ "姜": 256635,
175
+ "婉": 256897,
176
+ "婿": 256311,
177
+ "媚": 256436,
178
+ "媛": 256889,
179
+ "嫋": 257178,
180
+ "嫡": 256529,
181
+ "嫣": 256936,
182
+ "嬌": 256468,
183
+ "嬪": 257090,
184
+ "嬲": 256571,
185
+ "孵": 256620,
186
+ "宋": 256868,
187
+ "宦": 256371,
188
+ "寇": 256783,
189
+ "寞": 257021,
190
+ "寥": 256784,
191
+ "屁": 256629,
192
+ "屑": 256365,
193
+ "屓": 256780,
194
+ "屯": 256501,
195
+ "屹": 256663,
196
+ "岬": 256443,
197
+ "峠": 256822,
198
+ "峨": 257137,
199
+ "峯": 257068,
200
+ "崑": 257164,
201
+ "崑崙": 257043,
202
+ "崗": 257157,
203
+ "崙": 257165,
204
+ "嵩": 256589,
205
+ "嶋": 256329,
206
+ "嶮": 257206,
207
+ "嶺": 256447,
208
+ "嶺醍": 256740,
209
+ "嶽": 256999,
210
+ "巌": 256733,
211
+ "巳": 256614,
212
+ "巷": 256646,
213
+ "帛": 256706,
214
+ "帥": 256435,
215
+ "帷": 256583,
216
+ "幇": 257084,
217
+ "幌": 256496,
218
+ "幡": 256315,
219
+ "庵": 256425,
220
+ "庸": 256552,
221
+ "弐": 257050,
222
+ "弑": 257119,
223
+ "弔": 256603,
224
+ "弘": 256224,
225
+ "弛": 256430,
226
+ "弩": 256612,
227
+ "彌": 257054,
228
+ "彿": 256478,
229
+ "徊": 256670,
230
+ "徘": 256697,
231
+ "徘徊": 256567,
232
+ "徨": 256316,
233
+ "徽": 256607,
234
+ "忖": 257063,
235
+ "忸": 256847,
236
+ "忸怩": 256901,
237
+ "怩": 256848,
238
+ "恃": 257070,
239
+ "恍": 256572,
240
+ "恙": 257042,
241
+ "悄": 256775,
242
+ "悍": 256466,
243
+ "悧": 256762,
244
+ "悴": 256752,
245
+ "悸": 256650,
246
+ "惣": 256845,
247
+ "愴": 256785,
248
+ "愾": 256698,
249
+ "慄": 256286,
250
+ "慇": 256793,
251
+ "慇懃": 256816,
252
+ "慟": 256605,
253
+ "憊": 256659,
254
+ "憔": 256768,
255
+ "憔悴": 256788,
256
+ "憚": 256446,
257
+ "憬": 256505,
258
+ "憮": 256431,
259
+ "憺": 256864,
260
+ "懃": 256799,
261
+ "懊": 256695,
262
+ "懣": 256916,
263
+ "懦": 256993,
264
+ "懺": 256573,
265
+ "戟": 256479,
266
+ "扁": 257099,
267
+ "扈": 256741,
268
+ "抉": 256285,
269
+ "拌": 256965,
270
+ "拮": 256482,
271
+ "拱": 257193,
272
+ "拵": 256604,
273
+ "捌": 256297,
274
+ "捩": 256535,
275
+ "捲": 256414,
276
+ "捺": 256924,
277
+ "捻": 256225,
278
+ "掏": 257123,
279
+ "掣": 257140,
280
+ "掬": 256499,
281
+ "揄": 256627,
282
+ "揉": 256234,
283
+ "揶": 256622,
284
+ "揶揄": 256600,
285
+ "搅拌": 256984,
286
+ "搔": 256232,
287
+ "搦": 256527,
288
+ "摑": 256211,
289
+ "摺": 256394,
290
+ "撑": 257217,
291
+ "撓": 256917,
292
+ "撥": 256377,
293
+ "撹": 256734,
294
+ "撻": 257220,
295
+ "擢": 256726,
296
+ "擱": 256923,
297
+ "擲": 256310,
298
+ "擽": 256910,
299
+ "攀": 256836,
300
+ "攣": 256306,
301
+ "攪": 256465,
302
+ "攪拌": 256770,
303
+ "攫": 256269,
304
+ "斂": 257052,
305
+ "斃": 256453,
306
+ "斟": 256971,
307
+ "斡": 256656,
308
+ "昌": 256455,
309
+ "昴": 256714,
310
+ "晃": 256773,
311
+ "晦": 256661,
312
+ "暈": 257215,
313
+ "曙": 257108,
314
+ "曳": 256703,
315
+ "曽": 256451,
316
+ "朕": 256976,
317
+ "朦": 256595,
318
+ "朦朧": 256609,
319
+ "朧": 256408,
320
+ "朶": 256581,
321
+ "杓": 256728,
322
+ "杞": 256412,
323
+ "杳": 257117,
324
+ "杵": 257131,
325
+ "枡": 257126,
326
+ "枷": 256351,
327
+ "柑": 256806,
328
+ "柑橘": 256956,
329
+ "柘": 256934,
330
+ "柚": 256798,
331
+ "柩": 256866,
332
+ "柵": 256289,
333
+ "柿": 256887,
334
+ "栂": 256912,
335
+ "栅": 257081,
336
+ "栖": 256879,
337
+ "栞": 256669,
338
+ "桂": 256312,
339
+ "桐": 256213,
340
+ "桓": 256884,
341
+ "桔": 256675,
342
+ "桝": 256911,
343
+ "桟": 256428,
344
+ "桧": 257027,
345
+ "桿": 256687,
346
+ "梃": 257007,
347
+ "梓": 256396,
348
+ "梟": 256789,
349
+ "梢": 256658,
350
+ "梳": 256508,
351
+ "梶": 256619,
352
+ "棹": 257011,
353
+ "椀": 256542,
354
+ "椋": 257059,
355
+ "椒": 256439,
356
+ "椰": 257219,
357
+ "椿": 256419,
358
+ "楊": 256743,
359
+ "楓": 256247,
360
+ "楔": 256592,
361
+ "楕": 256673,
362
+ "楠": 256599,
363
+ "楯": 256626,
364
+ "榊": 257097,
365
+ "榎": 256634,
366
+ "槃": 257130,
367
+ "槌": 256251,
368
+ "樋": 256526,
369
+ "樫": 256287,
370
+ "樵": 256888,
371
+ "橇": 257046,
372
+ "橘": 256442,
373
+ "橙": 256361,
374
+ "檀": 256742,
375
+ "檄": 256854,
376
+ "檎": 256331,
377
+ "檜": 256373,
378
+ "櫂": 256210,
379
+ "櫓": 256689,
380
+ "櫛": 256235,
381
+ "欅": 257201,
382
+ "欒": 256925,
383
+ "殲": 256276,
384
+ "毅": 256357,
385
+ "毘": 257035,
386
+ "毟": 256494,
387
+ "毬": 257005,
388
+ "氈": 257195,
389
+ "氾": 256639,
390
+ "汀": 256445,
391
+ "汐": 257012,
392
+ "沁": 256649,
393
+ "沐": 257057,
394
+ "沫": 256336,
395
+ "沱": 256960,
396
+ "沽": 256876,
397
+ "洒": 256303,
398
+ "洟": 256862,
399
+ "浚": 257009,
400
+ "浩": 256288,
401
+ "浩浩": 256969,
402
+ "涎": 256473,
403
+ "涛": 256637,
404
+ "涨": 257205,
405
+ "涸": 256685,
406
+ "淀": 256346,
407
+ "淆": 257222,
408
+ "淑": 256359,
409
+ "淳": 256613,
410
+ "渠": 257072,
411
+ "渾": 256300,
412
+ "湘": 257110,
413
+ "湛": 256358,
414
+ "溌": 257022,
415
+ "滂": 256988,
416
+ "滓": 256416,
417
+ "滔": 256696,
418
+ "滲": 256221,
419
+ "滸": 257067,
420
+ "漉": 256810,
421
+ "漕": 256391,
422
+ "漣": 257143,
423
+ "漿": 256833,
424
+ "潟": 257184,
425
+ "澪": 256307,
426
+ "澱": 256539,
427
+ "澹": 256737,
428
+ "濁": 256242,
429
+ "濘": 256861,
430
+ "濛": 257096,
431
+ "濠": 256953,
432
+ "濤": 256545,
433
+ "瀉": 256820,
434
+ "瀑": 256781,
435
+ "瀟": 256896,
436
+ "瀟洒": 257015,
437
+ "瀧": 256949,
438
+ "瀾": 256987,
439
+ "灸": 256954,
440
+ "炒": 256339,
441
+ "炙": 256429,
442
+ "炬": 256610,
443
+ "炬燵": 256720,
444
+ "炯": 257045,
445
+ "烙": 256718,
446
+ "烹": 256998,
447
+ "焔": 256393,
448
+ "焙": 257056,
449
+ "煌": 256220,
450
+ "煎": 256520,
451
+ "熔": 256800,
452
+ "熾": 256418,
453
+ "燎": 256546,
454
+ "燐": 256388,
455
+ "燕": 256536,
456
+ "燦": 256611,
457
+ "燭": 256340,
458
+ "燵": 256690,
459
+ "燻": 256360,
460
+ "燼": 256853,
461
+ "牒": 256858,
462
+ "牝": 256839,
463
+ "牡": 256777,
464
+ "牡蠣": 257133,
465
+ "狛": 257092,
466
+ "狡": 256516,
467
+ "狡猾": 256722,
468
+ "狢": 257160,
469
+ "狸": 256491,
470
+ "狽": 256308,
471
+ "猊": 256618,
472
+ "猥": 256557,
473
+ "猴": 256883,
474
+ "猾": 256667,
475
+ "獪": 256914,
476
+ "獰": 256362,
477
+ "玲": 256518,
478
+ "玲瓏": 257225,
479
+ "珈": 257181,
480
+ "珈琲": 256602,
481
+ "珪": 257048,
482
+ "琉": 256846,
483
+ "琢": 256349,
484
+ "琥": 256299,
485
+ "琲": 257158,
486
+ "瑕": 256850,
487
+ "瑕疵": 257132,
488
+ "瑙": 256830,
489
+ "瑛": 256818,
490
+ "瑠": 256341,
491
+ "瑣": 256950,
492
+ "瑾": 256554,
493
+ "璽": 257135,
494
+ "瓢": 257134,
495
+ "瓢箪": 257019,
496
+ "甕": 256991,
497
+ "畔": 256550,
498
+ "畝": 257168,
499
+ "疇": 256370,
500
+ "疚": 257001,
501
+ "疵": 257002,
502
+ "痍": 256488,
503
+ "痒": 256356,
504
+ "痔": 257109,
505
+ "痙": 256475,
506
+ "痙攣": 256449,
507
+ "痣": 256562,
508
+ "瘡": 257111,
509
+ "瘴": 256347,
510
+ "瘴焔": 256933,
511
+ "癇": 256593,
512
+ "癇癪": 256814,
513
+ "癪": 256411,
514
+ "皐": 256249,
515
+ "皙": 256874,
516
+ "皺": 256275,
517
+ "盃": 256638,
518
+ "盥": 256851,
519
+ "盪": 256890,
520
+ "眇": 256506,
521
+ "眈": 256838,
522
+ "眦": 256577,
523
+ "眩": 256222,
524
+ "眩暈": 256563,
525
+ "眸": 256258,
526
+ "睥": 256590,
527
+ "睥睨": 256754,
528
+ "睨": 256206,
529
+ "睫": 256461,
530
+ "睾": 257073,
531
+ "瞠": 256328,
532
+ "瞥": 256263,
533
+ "瞰": 256502,
534
+ "瞼": 256271,
535
+ "矜": 256385,
536
+ "砥": 256859,
537
+ "硝": 256325,
538
+ "碁": 256671,
539
+ "碧": 256334,
540
+ "磊": 257076,
541
+ "磋": 257053,
542
+ "磋琢": 257163,
543
+ "磔": 256702,
544
+ "磺": 257028,
545
+ "礫": 256274,
546
+ "祇": 256647,
547
+ "祓": 256531,
548
+ "祟": 256759,
549
+ "祠": 256498,
550
+ "祢": 256559,
551
+ "禄": 256578,
552
+ "禊": 256937,
553
+ "禎": 256952,
554
+ "禿": 256483,
555
+ "秣": 257127,
556
+ "秦": 256665,
557
+ "稜": 256757,
558
+ "稟": 257186,
559
+ "穣": 256454,
560
+ "穹": 256644,
561
+ "穽": 257047,
562
+ "窩": 256594,
563
+ "窪": 256386,
564
+ "窯": 256616,
565
+ "窶": 257091,
566
+ "竄": 256630,
567
+ "竈": 256553,
568
+ "竦": 256229,
569
+ "竪": 256929,
570
+ "竿": 256586,
571
+ "笏": 256975,
572
+ "笠": 256417,
573
+ "笥": 256989,
574
+ "笨": 256892,
575
+ "笹": 256480,
576
+ "笹藪": 256712,
577
+ "筍": 257191,
578
+ "筏": 256940,
579
+ "筐": 256749,
580
+ "箍": 256997,
581
+ "箒": 256421,
582
+ "箔": 256645,
583
+ "箕": 256824,
584
+ "箝": 257016,
585
+ "箪": 257008,
586
+ "箪笥": 256857,
587
+ "箸": 256322,
588
+ "篝": 256654,
589
+ "篠": 256420,
590
+ "篠窪": 257094,
591
+ "篩": 257207,
592
+ "篭": 256375,
593
+ "簀": 256653,
594
+ "簒": 256657,
595
+ "簞": 257221,
596
+ "簞笥": 257192,
597
+ "簪": 256467,
598
+ "簾": 256782,
599
+ "籐": 256821,
600
+ "籾": 257151,
601
+ "粟": 256533,
602
+ "粟楠": 257000,
603
+ "粥": 256470,
604
+ "糠": 257024,
605
+ "糺": 257102,
606
+ "紆": 256746,
607
+ "紡": 256261,
608
+ "紺": 256262,
609
+ "紺碧": 256878,
610
+ "絢": 256631,
611
+ "絨": 256321,
612
+ "絹": 256330,
613
+ "綜": 256732,
614
+ "綯": 257146,
615
+ "綸": 256556,
616
+ "綽": 256921,
617
+ "綾": 256208,
618
+ "緋": 256422,
619
+ "緋鯉": 256982,
620
+ "緘": 257182,
621
+ "緞": 257156,
622
+ "緻": 256404,
623
+ "縊": 256904,
624
+ "縋": 256281,
625
+ "縒": 257088,
626
+ "縞": 256666,
627
+ "縷": 256686,
628
+ "縺": 256961,
629
+ "繍": 256327,
630
+ "繕": 256253,
631
+ "繡": 256683,
632
+ "繭": 256664,
633
+ "纂": 257004,
634
+ "缝": 257189,
635
+ "缪": 256676,
636
+ "罅": 256472,
637
+ "羹": 257077,
638
+ "翠": 256279,
639
+ "翠苓": 256566,
640
+ "翡": 256524,
641
+ "翡翠": 256463,
642
+ "翳": 256457,
643
+ "耄": 257060,
644
+ "聡": 256342,
645
+ "聳": 256522,
646
+ "聾": 256985,
647
+ "肛": 257147,
648
+ "肴": 256655,
649
+ "胤": 256476,
650
+ "胱": 257058,
651
+ "胴": 256237,
652
+ "脛": 256580,
653
+ "腋": 257018,
654
+ "腑": 256326,
655
+ "腥": 257083,
656
+ "腱": 256691,
657
+ "膂": 256440,
658
+ "膳": 256423,
659
+ "膵": 256973,
660
+ "膿": 256796,
661
+ "臀": 256918,
662
+ "臍": 256815,
663
+ "臙": 256935,
664
+ "臼": 256625,
665
+ "舐": 256223,
666
+ "舫": 257100,
667
+ "舳": 256907,
668
+ "舵": 256433,
669
+ "舷": 256608,
670
+ "艶": 256239,
671
+ "芋": 256314,
672
+ "芒": 256574,
673
+ "芙": 256977,
674
+ "芙蓉": 256407,
675
+ "芥": 256724,
676
+ "芦": 256292,
677
+ "芹": 257129,
678
+ "芻": 256500,
679
+ "苅": 257176,
680
+ "苑": 256794,
681
+ "苓": 257086,
682
+ "苔": 256374,
683
+ "苫": 256713,
684
+ "苺": 256395,
685
+ "茅": 256588,
686
+ "茜": 256284,
687
+ "茫": 256462,
688
+ "茸": 256493,
689
+ "茹": 256448,
690
+ "荊": 256946,
691
+ "荻": 257098,
692
+ "荼": 257010,
693
+ "莢": 256585,
694
+ "莪": 256660,
695
+ "莵": 256731,
696
+ "菅": 256803,
697
+ "菊": 256273,
698
+ "菖蒲": 257200,
699
+ "菫": 256291,
700
+ "菱": 256842,
701
+ "萌": 256280,
702
+ "萎": 256353,
703
+ "萝": 256317,
704
+ "葦": 256994,
705
+ "葱": 256826,
706
+ "葵": 256252,
707
+ "葺": 257159,
708
+ "蒐": 256558,
709
+ "蒜": 257037,
710
+ "蒲": 257141,
711
+ "蓉": 256736,
712
+ "蓑": 256776,
713
+ "蓬": 256943,
714
+ "蔦": 256456,
715
+ "蔭": 256584,
716
+ "蕎": 256786,
717
+ "蕗": 256397,
718
+ "薔": 256278,
719
+ "薙": 256245,
720
+ "薫": 256381,
721
+ "藁": 256364,
722
+ "藪": 256528,
723
+ "藹": 256840,
724
+ "虱": 256974,
725
+ "蚕": 256332,
726
+ "蚤": 257112,
727
+ "蚯蚓": 256867,
728
+ "蛆": 256582,
729
+ "蛍": 256485,
730
+ "蛙": 256392,
731
+ "蛞蝓": 257162,
732
+ "蛟": 257154,
733
+ "蛭": 256727,
734
+ "蛸": 256882,
735
+ "蛹": 257155,
736
+ "蛾": 256677,
737
+ "蜃": 256679,
738
+ "蜥蜴": 256244,
739
+ "蜻蛉": 256981,
740
+ "蝉": 256930,
741
+ "蝋": 256424,
742
+ "蝋燭": 256534,
743
+ "蝎": 257085,
744
+ "蝗": 256632,
745
+ "蝸": 256597,
746
+ "蝿": 256902,
747
+ "蟄": 257128,
748
+ "蟠": 256797,
749
+ "蟬": 256725,
750
+ "蟷螂": 256860,
751
+ "蟹": 256384,
752
+ "蟻": 256444,
753
+ "蠅": 256537,
754
+ "蠍": 256716,
755
+ "蠕": 257107,
756
+ "蠟": 256490,
757
+ "蠟燭": 256738,
758
+ "蠱": 256680,
759
+ "衒": 256769,
760
+ "衾": 256809,
761
+ "袁": 257175,
762
+ "袂": 256863,
763
+ "袈": 256477,
764
+ "袈裟": 256367,
765
+ "袱": 256871,
766
+ "袴": 256774,
767
+ "裟": 256507,
768
+ "裾": 256243,
769
+ "褄": 256856,
770
+ "褐": 256290,
771
+ "褥": 257166,
772
+ "褪": 256390,
773
+ "褸": 257062,
774
+ "襖": 256606,
775
+ "襞": 257179,
776
+ "襤": 257061,
777
+ "襤褸": 256905,
778
+ "覯": 257171,
779
+ "覿": 256807,
780
+ "訃": 257180,
781
+ "訛": 256804,
782
+ "訥": 256837,
783
+ "訶": 256808,
784
+ "詈": 256711,
785
+ "詠": 256218,
786
+ "詣": 256509,
787
+ "誂": 256805,
788
+ "誅": 256831,
789
+ "誑": 256621,
790
+ "誼": 256674,
791
+ "諌": 256684,
792
+ "諍": 256541,
793
+ "諒": 257026,
794
+ "諤": 257190,
795
+ "諫": 256523,
796
+ "諳": 257039,
797
+ "諺": 256909,
798
+ "謁": 256293,
799
+ "謐": 256525,
800
+ "謡": 256302,
801
+ "譚": 256352,
802
+ "譫": 257041,
803
+ "豹": 256387,
804
+ "貂": 256633,
805
+ "賎": 256995,
806
+ "贋": 256452,
807
+ "贔": 256772,
808
+ "贔屓": 256596,
809
+ "贤": 257044,
810
+ "赌": 256958,
811
+ "赚": 257038,
812
+ "趙": 257020,
813
+ "趙迂": 256561,
814
+ "跋": 256699,
815
+ "跋扈": 256841,
816
+ "踝": 257197,
817
+ "踵": 256255,
818
+ "蹂": 256415,
819
+ "蹂躙": 256344,
820
+ "蹄": 256406,
821
+ "蹲": 256409,
822
+ "躓": 256555,
823
+ "躙": 256354,
824
+ "躯": 256350,
825
+ "躰": 257209,
826
+ "躱": 256283,
827
+ "躾": 256405,
828
+ "軀": 256318,
829
+ "軋": 256260,
830
+ "軋轢": 256855,
831
+ "軛": 257226,
832
+ "輌": 256825,
833
+ "輓": 257172,
834
+ "輜": 256919,
835
+ "輻": 257218,
836
+ "輿": 256540,
837
+ "轍": 256484,
838
+ "轡": 256761,
839
+ "轢": 256426,
840
+ "辣": 256343,
841
+ "辰": 256301,
842
+ "��巳": 257079,
843
+ "辻": 256532,
844
+ "辻褄": 256835,
845
+ "迂": 256268,
846
+ "迂闊": 256376,
847
+ "迭": 257148,
848
+ "迸": 256304,
849
+ "逅": 256682,
850
+ "逞": 256486,
851
+ "逡": 256323,
852
+ "逬": 257213,
853
+ "遁": 256617,
854
+ "遽": 256510,
855
+ "邂": 256681,
856
+ "邂逅": 256672,
857
+ "邏": 256688,
858
+ "鄙": 256640,
859
+ "鄭": 256515,
860
+ "酊": 257030,
861
+ "酋": 257185,
862
+ "酎": 256865,
863
+ "酢": 256519,
864
+ "酩": 257029,
865
+ "酩酊": 256778,
866
+ "醍": 256259,
867
+ "醍醐": 256872,
868
+ "醐": 256906,
869
+ "醤": 256434,
870
+ "醬": 256750,
871
+ "釜": 256497,
872
+ "鈎": 257066,
873
+ "鉈": 256487,
874
+ "鉤": 256458,
875
+ "鉾": 257078,
876
+ "銚": 256756,
877
+ "銛": 257040,
878
+ "銜": 257167,
879
+ "鋏": 256636,
880
+ "鋒": 256569,
881
+ "鋤": 256944,
882
+ "鋲": 256834,
883
+ "鋸": 256710,
884
+ "錆": 256305,
885
+ "錐": 256492,
886
+ "錐揉": 257033,
887
+ "錘": 256875,
888
+ "錦": 256877,
889
+ "錦鯉": 257075,
890
+ "錨": 256880,
891
+ "鍔": 256471,
892
+ "鍬": 256829,
893
+ "鍮": 256843,
894
+ "鍼": 256978,
895
+ "鎌": 256254,
896
+ "鎌鼬": 257113,
897
+ "鎚": 256400,
898
+ "鎧": 256207,
899
+ "鎧兜": 256748,
900
+ "鎬": 256942,
901
+ "鏃": 256543,
902
+ "鏖": 256744,
903
+ "鏤": 257116,
904
+ "鐙": 256967,
905
+ "鑢": 257161,
906
+ "鑼": 256939,
907
+ "鑽": 256517,
908
+ "閂": 256832,
909
+ "閏": 256464,
910
+ "間冴": 257174,
911
+ "閨": 256895,
912
+ "閾": 257187,
913
+ "闊": 256324,
914
+ "闖": 256791,
915
+ "闢": 256926,
916
+ "闹": 257138,
917
+ "阜": 256986,
918
+ "隈": 256481,
919
+ "隘": 257003,
920
+ "隧": 257023,
921
+ "隼": 256459,
922
+ "雁": 256403,
923
+ "雉": 257145,
924
+ "雫": 256216,
925
+ "雹": 256928,
926
+ "霆": 256701,
927
+ "霜": 256530,
928
+ "霞": 256277,
929
+ "霰": 257006,
930
+ "霹": 256970,
931
+ "靂": 256962,
932
+ "靄": 256355,
933
+ "靡": 256504,
934
+ "靭": 256402,
935
+ "靱": 256704,
936
+ "鞄": 256226,
937
+ "鞍": 256460,
938
+ "鞘": 256250,
939
+ "鞴": 257101,
940
+ "韋": 256601,
941
+ "韜晦": 257105,
942
+ "韻": 256437,
943
+ "頚": 256931,
944
+ "頰": 256209,
945
+ "頴": 257152,
946
+ "頷": 256204,
947
+ "頽": 256705,
948
+ "顎": 256217,
949
+ "顎髭": 257089,
950
+ "顎鬚": 256920,
951
+ "顚": 256624,
952
+ "顛": 256576,
953
+ "顰": 256266,
954
+ "颇": 257203,
955
+ "颯": 256450,
956
+ "颶": 257118,
957
+ "飄": 256474,
958
+ "飆": 257223,
959
+ "飴": 256378,
960
+ "餃": 256812,
961
+ "餉": 256651,
962
+ "餞": 256771,
963
+ "餡": 256947,
964
+ "饅": 256568,
965
+ "饐": 257210,
966
+ "饗": 256870,
967
+ "駁": 256828,
968
+ "駈": 257114,
969
+ "駿": 256401,
970
+ "騾": 257069,
971
+ "驀": 256885,
972
+ "驟": 257087,
973
+ "骰": 256957,
974
+ "髏": 257121,
975
+ "髑": 257169,
976
+ "髑髏": 256735,
977
+ "髭": 256256,
978
+ "髯": 257144,
979
+ "鬚": 256941,
980
+ "鬣": 256945,
981
+ "鬨": 256642,
982
+ "鬩": 257170,
983
+ "魁": 256755,
984
+ "魃": 257124,
985
+ "魄": 256668,
986
+ "魍": 257122,
987
+ "魍魎": 257199,
988
+ "魎": 256932,
989
+ "魏": 256893,
990
+ "魏嶽": 256751,
991
+ "魑": 257025,
992
+ "魘": 257031,
993
+ "鮫": 256570,
994
+ "鮭": 256959,
995
+ "鯉": 256765,
996
+ "鯖": 257064,
997
+ "鯛": 257074,
998
+ "鰊": 256983,
999
+ "鰐": 256903,
1000
+ "鰓": 257082,
1001
+ "鰭": 256963,
1002
+ "鰹": 257153,
1003
+ "鰻": 256730,
1004
+ "鱈": 256615,
1005
+ "鱗": 256238,
1006
+ "鲱": 256819,
1007
+ "鳳": 256787,
1008
+ "鳳凰": 256441,
1009
+ "鳶": 256551,
1010
+ "鴨": 256587,
1011
+ "鵜": 256514,
1012
+ "鵞絨": 257095,
1013
+ "鵠": 256823,
1014
+ "鵺": 256951,
1015
+ "鶴": 256295,
1016
+ "鷲": 256296,
1017
+ "鷲摑": 256992,
1018
+ "鷺": 257208,
1019
+ "鸚鵡": 257103,
1020
+ "鹵": 256811,
1021
+ "鹸": 256380,
1022
+ "鹼": 256898,
1023
+ "麒麟": 257055,
1024
+ "麓": 256427,
1025
+ "麹": 257104,
1026
+ "麺": 256432,
1027
+ "麺麭": 257093,
1028
+ "麾": 256900,
1029
+ "黛": 256922,
1030
+ "黴": 256915,
1031
+ "鼬": 257224,
1032
+ "鼾": 257120,
1033
+ "齎": 256643,
1034
+ "齟": 256766,
1035
+ "齟齬": 256852,
1036
+ "齧": 256413,
1037
+ "齬": 256795,
1038
+ "𠮟": 256948
1039
+ }
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "llm-ft-lightnovels/nllb-1b3-lnft2-final",
3
  "activation_dropout": 0.0,
4
  "activation_function": "relu",
5
  "architectures": [
@@ -28,7 +28,7 @@
28
  "pad_token_id": 1,
29
  "scale_embedding": true,
30
  "torch_dtype": "float32",
31
- "transformers_version": "4.22.0.dev0",
32
  "use_cache": false,
33
- "vocab_size": 256206
34
  }
 
1
  {
2
+ "_name_or_path": "nllb-1b3-lnft2.4/final",
3
  "activation_dropout": 0.0,
4
  "activation_function": "relu",
5
  "architectures": [
 
28
  "pad_token_id": 1,
29
  "scale_embedding": true,
30
  "torch_dtype": "float32",
31
+ "transformers_version": "4.25.1",
32
  "use_cache": false,
33
+ "vocab_size": 257241
34
  }
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:85ed1617966b9c9ea0ce5d260e5a172164f72f0880c9c6ac12220bf689901456
3
- size 5482902982
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10ab3397ca2e368ed8060607113e62a05b4c7140caf2c4abd4c94a6bae94ad41
3
+ size 5495548056
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3a0054af1fd37599939637d51108be192eef687e6e370ccaeaa92c773f394fb0
3
- size 17331294
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73176d471f0de64785a4c076e56b7b5ca3b4bdaa1bd8a252f8ccad6aa0b112e2
3
+ size 17518141
tokenizer_config.json CHANGED
@@ -12,13 +12,13 @@
12
  "single_word": false
13
  },
14
  "model_max_length": 1024,
15
- "name_or_path": "llm-ft-lightnovels/nllb-1b3-lnft2-final",
16
  "pad_token": "<pad>",
17
  "sep_token": "</s>",
18
  "sp_model_kwargs": {},
19
  "special_tokens_map_file": null,
20
  "src_lang": "jpn_Jpan",
21
  "tgt_lang": "eng_Latn",
22
- "tokenizer_class": "NllbTokenizer",
23
  "unk_token": "<unk>"
24
  }
 
12
  "single_word": false
13
  },
14
  "model_max_length": 1024,
15
+ "name_or_path": "facebook/nllb-200-distilled-600M",
16
  "pad_token": "<pad>",
17
  "sep_token": "</s>",
18
  "sp_model_kwargs": {},
19
  "special_tokens_map_file": null,
20
  "src_lang": "jpn_Jpan",
21
  "tgt_lang": "eng_Latn",
22
+ "tokenizer_class": "TrainingNllbTokenizer",
23
  "unk_token": "<unk>"
24
  }