jykim310 commited on
Commit
b85fe2e
1 Parent(s): 4d80672
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
mlc-chat-config.json ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "roberta",
3
+ "quantization": "q0f32",
4
+ "model_config": {
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-05,
14
+ "max_position_embeddings": 514,
15
+ "model_type": "roberta",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 1,
19
+ "type_vocab_size": 1,
20
+ "vocab_size": 50265,
21
+ "num_labels": 2,
22
+ "classifier_dropout": null,
23
+ "chunk_size_feed_forward": 0,
24
+ "is_decoder": false,
25
+ "add_cross_attention": false,
26
+ "use_return_dict": false,
27
+ "context_window_size": 768,
28
+ "prefill_chunk_size": 0,
29
+ "max_batch_size": 80,
30
+ "tensor_parallel_shards": 1,
31
+ "dtype": "float32"
32
+ },
33
+ "vocab_size": 50265,
34
+ "context_window_size": 768,
35
+ "sliding_window_size": -1,
36
+ "prefill_chunk_size": 0,
37
+ "attention_sink_size": -1,
38
+ "tensor_parallel_shards": 1,
39
+ "mean_gen_len": 128,
40
+ "max_gen_len": 512,
41
+ "shift_fill_factor": 0.3,
42
+ "temperature": 0,
43
+ "presence_penalty": 0.0,
44
+ "frequency_penalty": 0.0,
45
+ "repetition_penalty": 1.0,
46
+ "top_p": 0.95,
47
+ "conv_template": {
48
+ "name": "roberta",
49
+ "system_template": "{system_message}",
50
+ "system_message": "",
51
+ "add_role_after_system_message": true,
52
+ "roles": {
53
+ "user": "",
54
+ "assistant": ""
55
+ },
56
+ "role_templates": {
57
+ "user": "{user_message}",
58
+ "assistant": "{assistant_message}",
59
+ "tool": "{tool_message}"
60
+ },
61
+ "messages": [],
62
+ "seps": [
63
+ "</s>"
64
+ ],
65
+ "role_content_sep": "",
66
+ "role_empty_sep": "",
67
+ "stop_str": [],
68
+ "stop_token_ids": [
69
+ 2
70
+ ],
71
+ "function_string": "",
72
+ "use_function_calling": false,
73
+ "image_token_index": -1
74
+ },
75
+ "pad_token_id": 1,
76
+ "bos_token_id": 0,
77
+ "eos_token_id": 2,
78
+ "tokenizer_files": [
79
+ "vocab.json",
80
+ "merges.txt",
81
+ "tokenizer_config.json"
82
+ ],
83
+ "version": "0.1.0"
84
+ }
ndarray-cache-b16.json ADDED
@@ -0,0 +1,2194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 205,
4
+ "ParamBytes": 500957200.0,
5
+ "BitsPerParam": 32.0
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 77207040,
12
+ "records": [
13
+ {
14
+ "name": "roberta.embeddings.word_embeddings.weight",
15
+ "shape": [
16
+ 50265,
17
+ 768
18
+ ],
19
+ "dtype": "bfloat16",
20
+ "format": "raw",
21
+ "nbytes": 77207040,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "822cb1e659d209d0c4ab502aa588a4a9"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 32699912,
31
+ "records": [
32
+ {
33
+ "name": "latency_classifier.dense.bias",
34
+ "shape": [
35
+ 768
36
+ ],
37
+ "dtype": "bfloat16",
38
+ "format": "raw",
39
+ "nbytes": 1536,
40
+ "byteOffset": 0
41
+ },
42
+ {
43
+ "name": "latency_classifier.dense.weight",
44
+ "shape": [
45
+ 768,
46
+ 768
47
+ ],
48
+ "dtype": "bfloat16",
49
+ "format": "raw",
50
+ "nbytes": 1179648,
51
+ "byteOffset": 1536
52
+ },
53
+ {
54
+ "name": "latency_classifier.out_proj.bias",
55
+ "shape": [
56
+ 2
57
+ ],
58
+ "dtype": "bfloat16",
59
+ "format": "raw",
60
+ "nbytes": 4,
61
+ "byteOffset": 1181184
62
+ },
63
+ {
64
+ "name": "latency_classifier.out_proj.weight",
65
+ "shape": [
66
+ 2,
67
+ 768
68
+ ],
69
+ "dtype": "bfloat16",
70
+ "format": "raw",
71
+ "nbytes": 3072,
72
+ "byteOffset": 1181188
73
+ },
74
+ {
75
+ "name": "quality_classifier.dense.bias",
76
+ "shape": [
77
+ 768
78
+ ],
79
+ "dtype": "bfloat16",
80
+ "format": "raw",
81
+ "nbytes": 1536,
82
+ "byteOffset": 1184260
83
+ },
84
+ {
85
+ "name": "quality_classifier.dense.weight",
86
+ "shape": [
87
+ 768,
88
+ 768
89
+ ],
90
+ "dtype": "bfloat16",
91
+ "format": "raw",
92
+ "nbytes": 1179648,
93
+ "byteOffset": 1185796
94
+ },
95
+ {
96
+ "name": "quality_classifier.out_proj.bias",
97
+ "shape": [
98
+ 2
99
+ ],
100
+ "dtype": "bfloat16",
101
+ "format": "raw",
102
+ "nbytes": 4,
103
+ "byteOffset": 2365444
104
+ },
105
+ {
106
+ "name": "quality_classifier.out_proj.weight",
107
+ "shape": [
108
+ 2,
109
+ 768
110
+ ],
111
+ "dtype": "bfloat16",
112
+ "format": "raw",
113
+ "nbytes": 3072,
114
+ "byteOffset": 2365448
115
+ },
116
+ {
117
+ "name": "roberta.embeddings.LayerNorm.bias",
118
+ "shape": [
119
+ 768
120
+ ],
121
+ "dtype": "bfloat16",
122
+ "format": "raw",
123
+ "nbytes": 1536,
124
+ "byteOffset": 2368520
125
+ },
126
+ {
127
+ "name": "roberta.embeddings.LayerNorm.weight",
128
+ "shape": [
129
+ 768
130
+ ],
131
+ "dtype": "bfloat16",
132
+ "format": "raw",
133
+ "nbytes": 1536,
134
+ "byteOffset": 2370056
135
+ },
136
+ {
137
+ "name": "roberta.embeddings.position_embeddings.weight",
138
+ "shape": [
139
+ 514,
140
+ 768
141
+ ],
142
+ "dtype": "bfloat16",
143
+ "format": "raw",
144
+ "nbytes": 789504,
145
+ "byteOffset": 2371592
146
+ },
147
+ {
148
+ "name": "roberta.embeddings.token_type_embeddings.weight",
149
+ "shape": [
150
+ 1,
151
+ 768
152
+ ],
153
+ "dtype": "bfloat16",
154
+ "format": "raw",
155
+ "nbytes": 1536,
156
+ "byteOffset": 3161096
157
+ },
158
+ {
159
+ "name": "roberta.encoder.layer.0.attention.output.LayerNorm.bias",
160
+ "shape": [
161
+ 768
162
+ ],
163
+ "dtype": "bfloat16",
164
+ "format": "raw",
165
+ "nbytes": 1536,
166
+ "byteOffset": 3162632
167
+ },
168
+ {
169
+ "name": "roberta.encoder.layer.0.attention.output.LayerNorm.weight",
170
+ "shape": [
171
+ 768
172
+ ],
173
+ "dtype": "bfloat16",
174
+ "format": "raw",
175
+ "nbytes": 1536,
176
+ "byteOffset": 3164168
177
+ },
178
+ {
179
+ "name": "roberta.encoder.layer.0.attention.output.dense.bias",
180
+ "shape": [
181
+ 768
182
+ ],
183
+ "dtype": "bfloat16",
184
+ "format": "raw",
185
+ "nbytes": 1536,
186
+ "byteOffset": 3165704
187
+ },
188
+ {
189
+ "name": "roberta.encoder.layer.0.attention.output.dense.weight",
190
+ "shape": [
191
+ 768,
192
+ 768
193
+ ],
194
+ "dtype": "bfloat16",
195
+ "format": "raw",
196
+ "nbytes": 1179648,
197
+ "byteOffset": 3167240
198
+ },
199
+ {
200
+ "name": "roberta.encoder.layer.0.attention.self.key.bias",
201
+ "shape": [
202
+ 768
203
+ ],
204
+ "dtype": "bfloat16",
205
+ "format": "raw",
206
+ "nbytes": 1536,
207
+ "byteOffset": 4346888
208
+ },
209
+ {
210
+ "name": "roberta.encoder.layer.0.attention.self.key.weight",
211
+ "shape": [
212
+ 768,
213
+ 768
214
+ ],
215
+ "dtype": "bfloat16",
216
+ "format": "raw",
217
+ "nbytes": 1179648,
218
+ "byteOffset": 4348424
219
+ },
220
+ {
221
+ "name": "roberta.encoder.layer.0.attention.self.query.bias",
222
+ "shape": [
223
+ 768
224
+ ],
225
+ "dtype": "bfloat16",
226
+ "format": "raw",
227
+ "nbytes": 1536,
228
+ "byteOffset": 5528072
229
+ },
230
+ {
231
+ "name": "roberta.encoder.layer.0.attention.self.query.weight",
232
+ "shape": [
233
+ 768,
234
+ 768
235
+ ],
236
+ "dtype": "bfloat16",
237
+ "format": "raw",
238
+ "nbytes": 1179648,
239
+ "byteOffset": 5529608
240
+ },
241
+ {
242
+ "name": "roberta.encoder.layer.0.attention.self.value.bias",
243
+ "shape": [
244
+ 768
245
+ ],
246
+ "dtype": "bfloat16",
247
+ "format": "raw",
248
+ "nbytes": 1536,
249
+ "byteOffset": 6709256
250
+ },
251
+ {
252
+ "name": "roberta.encoder.layer.0.attention.self.value.weight",
253
+ "shape": [
254
+ 768,
255
+ 768
256
+ ],
257
+ "dtype": "bfloat16",
258
+ "format": "raw",
259
+ "nbytes": 1179648,
260
+ "byteOffset": 6710792
261
+ },
262
+ {
263
+ "name": "roberta.encoder.layer.0.intermediate.dense.bias",
264
+ "shape": [
265
+ 3072
266
+ ],
267
+ "dtype": "bfloat16",
268
+ "format": "raw",
269
+ "nbytes": 6144,
270
+ "byteOffset": 7890440
271
+ },
272
+ {
273
+ "name": "roberta.encoder.layer.0.intermediate.dense.weight",
274
+ "shape": [
275
+ 3072,
276
+ 768
277
+ ],
278
+ "dtype": "bfloat16",
279
+ "format": "raw",
280
+ "nbytes": 4718592,
281
+ "byteOffset": 7896584
282
+ },
283
+ {
284
+ "name": "roberta.encoder.layer.0.output.LayerNorm.bias",
285
+ "shape": [
286
+ 768
287
+ ],
288
+ "dtype": "bfloat16",
289
+ "format": "raw",
290
+ "nbytes": 1536,
291
+ "byteOffset": 12615176
292
+ },
293
+ {
294
+ "name": "roberta.encoder.layer.0.output.LayerNorm.weight",
295
+ "shape": [
296
+ 768
297
+ ],
298
+ "dtype": "bfloat16",
299
+ "format": "raw",
300
+ "nbytes": 1536,
301
+ "byteOffset": 12616712
302
+ },
303
+ {
304
+ "name": "roberta.encoder.layer.0.output.dense.bias",
305
+ "shape": [
306
+ 768
307
+ ],
308
+ "dtype": "bfloat16",
309
+ "format": "raw",
310
+ "nbytes": 1536,
311
+ "byteOffset": 12618248
312
+ },
313
+ {
314
+ "name": "roberta.encoder.layer.0.output.dense.weight",
315
+ "shape": [
316
+ 768,
317
+ 3072
318
+ ],
319
+ "dtype": "bfloat16",
320
+ "format": "raw",
321
+ "nbytes": 4718592,
322
+ "byteOffset": 12619784
323
+ },
324
+ {
325
+ "name": "roberta.encoder.layer.1.attention.output.LayerNorm.bias",
326
+ "shape": [
327
+ 768
328
+ ],
329
+ "dtype": "bfloat16",
330
+ "format": "raw",
331
+ "nbytes": 1536,
332
+ "byteOffset": 17338376
333
+ },
334
+ {
335
+ "name": "roberta.encoder.layer.1.attention.output.LayerNorm.weight",
336
+ "shape": [
337
+ 768
338
+ ],
339
+ "dtype": "bfloat16",
340
+ "format": "raw",
341
+ "nbytes": 1536,
342
+ "byteOffset": 17339912
343
+ },
344
+ {
345
+ "name": "roberta.encoder.layer.1.attention.output.dense.bias",
346
+ "shape": [
347
+ 768
348
+ ],
349
+ "dtype": "bfloat16",
350
+ "format": "raw",
351
+ "nbytes": 1536,
352
+ "byteOffset": 17341448
353
+ },
354
+ {
355
+ "name": "roberta.encoder.layer.1.attention.output.dense.weight",
356
+ "shape": [
357
+ 768,
358
+ 768
359
+ ],
360
+ "dtype": "bfloat16",
361
+ "format": "raw",
362
+ "nbytes": 1179648,
363
+ "byteOffset": 17342984
364
+ },
365
+ {
366
+ "name": "roberta.encoder.layer.1.attention.self.key.bias",
367
+ "shape": [
368
+ 768
369
+ ],
370
+ "dtype": "bfloat16",
371
+ "format": "raw",
372
+ "nbytes": 1536,
373
+ "byteOffset": 18522632
374
+ },
375
+ {
376
+ "name": "roberta.encoder.layer.1.attention.self.key.weight",
377
+ "shape": [
378
+ 768,
379
+ 768
380
+ ],
381
+ "dtype": "bfloat16",
382
+ "format": "raw",
383
+ "nbytes": 1179648,
384
+ "byteOffset": 18524168
385
+ },
386
+ {
387
+ "name": "roberta.encoder.layer.1.attention.self.query.bias",
388
+ "shape": [
389
+ 768
390
+ ],
391
+ "dtype": "bfloat16",
392
+ "format": "raw",
393
+ "nbytes": 1536,
394
+ "byteOffset": 19703816
395
+ },
396
+ {
397
+ "name": "roberta.encoder.layer.1.attention.self.query.weight",
398
+ "shape": [
399
+ 768,
400
+ 768
401
+ ],
402
+ "dtype": "bfloat16",
403
+ "format": "raw",
404
+ "nbytes": 1179648,
405
+ "byteOffset": 19705352
406
+ },
407
+ {
408
+ "name": "roberta.encoder.layer.1.attention.self.value.bias",
409
+ "shape": [
410
+ 768
411
+ ],
412
+ "dtype": "bfloat16",
413
+ "format": "raw",
414
+ "nbytes": 1536,
415
+ "byteOffset": 20885000
416
+ },
417
+ {
418
+ "name": "roberta.encoder.layer.1.attention.self.value.weight",
419
+ "shape": [
420
+ 768,
421
+ 768
422
+ ],
423
+ "dtype": "bfloat16",
424
+ "format": "raw",
425
+ "nbytes": 1179648,
426
+ "byteOffset": 20886536
427
+ },
428
+ {
429
+ "name": "roberta.encoder.layer.1.intermediate.dense.bias",
430
+ "shape": [
431
+ 3072
432
+ ],
433
+ "dtype": "bfloat16",
434
+ "format": "raw",
435
+ "nbytes": 6144,
436
+ "byteOffset": 22066184
437
+ },
438
+ {
439
+ "name": "roberta.encoder.layer.1.intermediate.dense.weight",
440
+ "shape": [
441
+ 3072,
442
+ 768
443
+ ],
444
+ "dtype": "bfloat16",
445
+ "format": "raw",
446
+ "nbytes": 4718592,
447
+ "byteOffset": 22072328
448
+ },
449
+ {
450
+ "name": "roberta.encoder.layer.1.output.LayerNorm.bias",
451
+ "shape": [
452
+ 768
453
+ ],
454
+ "dtype": "bfloat16",
455
+ "format": "raw",
456
+ "nbytes": 1536,
457
+ "byteOffset": 26790920
458
+ },
459
+ {
460
+ "name": "roberta.encoder.layer.1.output.LayerNorm.weight",
461
+ "shape": [
462
+ 768
463
+ ],
464
+ "dtype": "bfloat16",
465
+ "format": "raw",
466
+ "nbytes": 1536,
467
+ "byteOffset": 26792456
468
+ },
469
+ {
470
+ "name": "roberta.encoder.layer.1.output.dense.bias",
471
+ "shape": [
472
+ 768
473
+ ],
474
+ "dtype": "bfloat16",
475
+ "format": "raw",
476
+ "nbytes": 1536,
477
+ "byteOffset": 26793992
478
+ },
479
+ {
480
+ "name": "roberta.encoder.layer.1.output.dense.weight",
481
+ "shape": [
482
+ 768,
483
+ 3072
484
+ ],
485
+ "dtype": "bfloat16",
486
+ "format": "raw",
487
+ "nbytes": 4718592,
488
+ "byteOffset": 26795528
489
+ },
490
+ {
491
+ "name": "roberta.encoder.layer.10.attention.output.LayerNorm.bias",
492
+ "shape": [
493
+ 768
494
+ ],
495
+ "dtype": "bfloat16",
496
+ "format": "raw",
497
+ "nbytes": 1536,
498
+ "byteOffset": 31514120
499
+ },
500
+ {
501
+ "name": "roberta.encoder.layer.10.attention.output.LayerNorm.weight",
502
+ "shape": [
503
+ 768
504
+ ],
505
+ "dtype": "bfloat16",
506
+ "format": "raw",
507
+ "nbytes": 1536,
508
+ "byteOffset": 31515656
509
+ },
510
+ {
511
+ "name": "roberta.encoder.layer.10.attention.output.dense.bias",
512
+ "shape": [
513
+ 768
514
+ ],
515
+ "dtype": "bfloat16",
516
+ "format": "raw",
517
+ "nbytes": 1536,
518
+ "byteOffset": 31517192
519
+ },
520
+ {
521
+ "name": "roberta.encoder.layer.10.attention.output.dense.weight",
522
+ "shape": [
523
+ 768,
524
+ 768
525
+ ],
526
+ "dtype": "bfloat16",
527
+ "format": "raw",
528
+ "nbytes": 1179648,
529
+ "byteOffset": 31518728
530
+ },
531
+ {
532
+ "name": "roberta.encoder.layer.10.attention.self.key.bias",
533
+ "shape": [
534
+ 768
535
+ ],
536
+ "dtype": "bfloat16",
537
+ "format": "raw",
538
+ "nbytes": 1536,
539
+ "byteOffset": 32698376
540
+ }
541
+ ],
542
+ "md5sum": "c773b3066463e8b7ca2c687935fdd38d"
543
+ },
544
+ {
545
+ "dataPath": "params_shard_2.bin",
546
+ "format": "raw-shard",
547
+ "nbytes": 31899648,
548
+ "records": [
549
+ {
550
+ "name": "roberta.encoder.layer.10.attention.self.key.weight",
551
+ "shape": [
552
+ 768,
553
+ 768
554
+ ],
555
+ "dtype": "bfloat16",
556
+ "format": "raw",
557
+ "nbytes": 1179648,
558
+ "byteOffset": 0
559
+ },
560
+ {
561
+ "name": "roberta.encoder.layer.10.attention.self.query.bias",
562
+ "shape": [
563
+ 768
564
+ ],
565
+ "dtype": "bfloat16",
566
+ "format": "raw",
567
+ "nbytes": 1536,
568
+ "byteOffset": 1179648
569
+ },
570
+ {
571
+ "name": "roberta.encoder.layer.10.attention.self.query.weight",
572
+ "shape": [
573
+ 768,
574
+ 768
575
+ ],
576
+ "dtype": "bfloat16",
577
+ "format": "raw",
578
+ "nbytes": 1179648,
579
+ "byteOffset": 1181184
580
+ },
581
+ {
582
+ "name": "roberta.encoder.layer.10.attention.self.value.bias",
583
+ "shape": [
584
+ 768
585
+ ],
586
+ "dtype": "bfloat16",
587
+ "format": "raw",
588
+ "nbytes": 1536,
589
+ "byteOffset": 2360832
590
+ },
591
+ {
592
+ "name": "roberta.encoder.layer.10.attention.self.value.weight",
593
+ "shape": [
594
+ 768,
595
+ 768
596
+ ],
597
+ "dtype": "bfloat16",
598
+ "format": "raw",
599
+ "nbytes": 1179648,
600
+ "byteOffset": 2362368
601
+ },
602
+ {
603
+ "name": "roberta.encoder.layer.10.intermediate.dense.bias",
604
+ "shape": [
605
+ 3072
606
+ ],
607
+ "dtype": "bfloat16",
608
+ "format": "raw",
609
+ "nbytes": 6144,
610
+ "byteOffset": 3542016
611
+ },
612
+ {
613
+ "name": "roberta.encoder.layer.10.intermediate.dense.weight",
614
+ "shape": [
615
+ 3072,
616
+ 768
617
+ ],
618
+ "dtype": "bfloat16",
619
+ "format": "raw",
620
+ "nbytes": 4718592,
621
+ "byteOffset": 3548160
622
+ },
623
+ {
624
+ "name": "roberta.encoder.layer.10.output.LayerNorm.bias",
625
+ "shape": [
626
+ 768
627
+ ],
628
+ "dtype": "bfloat16",
629
+ "format": "raw",
630
+ "nbytes": 1536,
631
+ "byteOffset": 8266752
632
+ },
633
+ {
634
+ "name": "roberta.encoder.layer.10.output.LayerNorm.weight",
635
+ "shape": [
636
+ 768
637
+ ],
638
+ "dtype": "bfloat16",
639
+ "format": "raw",
640
+ "nbytes": 1536,
641
+ "byteOffset": 8268288
642
+ },
643
+ {
644
+ "name": "roberta.encoder.layer.10.output.dense.bias",
645
+ "shape": [
646
+ 768
647
+ ],
648
+ "dtype": "bfloat16",
649
+ "format": "raw",
650
+ "nbytes": 1536,
651
+ "byteOffset": 8269824
652
+ },
653
+ {
654
+ "name": "roberta.encoder.layer.10.output.dense.weight",
655
+ "shape": [
656
+ 768,
657
+ 3072
658
+ ],
659
+ "dtype": "bfloat16",
660
+ "format": "raw",
661
+ "nbytes": 4718592,
662
+ "byteOffset": 8271360
663
+ },
664
+ {
665
+ "name": "roberta.encoder.layer.11.attention.output.LayerNorm.bias",
666
+ "shape": [
667
+ 768
668
+ ],
669
+ "dtype": "bfloat16",
670
+ "format": "raw",
671
+ "nbytes": 1536,
672
+ "byteOffset": 12989952
673
+ },
674
+ {
675
+ "name": "roberta.encoder.layer.11.attention.output.LayerNorm.weight",
676
+ "shape": [
677
+ 768
678
+ ],
679
+ "dtype": "bfloat16",
680
+ "format": "raw",
681
+ "nbytes": 1536,
682
+ "byteOffset": 12991488
683
+ },
684
+ {
685
+ "name": "roberta.encoder.layer.11.attention.output.dense.bias",
686
+ "shape": [
687
+ 768
688
+ ],
689
+ "dtype": "bfloat16",
690
+ "format": "raw",
691
+ "nbytes": 1536,
692
+ "byteOffset": 12993024
693
+ },
694
+ {
695
+ "name": "roberta.encoder.layer.11.attention.output.dense.weight",
696
+ "shape": [
697
+ 768,
698
+ 768
699
+ ],
700
+ "dtype": "bfloat16",
701
+ "format": "raw",
702
+ "nbytes": 1179648,
703
+ "byteOffset": 12994560
704
+ },
705
+ {
706
+ "name": "roberta.encoder.layer.11.attention.self.key.bias",
707
+ "shape": [
708
+ 768
709
+ ],
710
+ "dtype": "bfloat16",
711
+ "format": "raw",
712
+ "nbytes": 1536,
713
+ "byteOffset": 14174208
714
+ },
715
+ {
716
+ "name": "roberta.encoder.layer.11.attention.self.key.weight",
717
+ "shape": [
718
+ 768,
719
+ 768
720
+ ],
721
+ "dtype": "bfloat16",
722
+ "format": "raw",
723
+ "nbytes": 1179648,
724
+ "byteOffset": 14175744
725
+ },
726
+ {
727
+ "name": "roberta.encoder.layer.11.attention.self.query.bias",
728
+ "shape": [
729
+ 768
730
+ ],
731
+ "dtype": "bfloat16",
732
+ "format": "raw",
733
+ "nbytes": 1536,
734
+ "byteOffset": 15355392
735
+ },
736
+ {
737
+ "name": "roberta.encoder.layer.11.attention.self.query.weight",
738
+ "shape": [
739
+ 768,
740
+ 768
741
+ ],
742
+ "dtype": "bfloat16",
743
+ "format": "raw",
744
+ "nbytes": 1179648,
745
+ "byteOffset": 15356928
746
+ },
747
+ {
748
+ "name": "roberta.encoder.layer.11.attention.self.value.bias",
749
+ "shape": [
750
+ 768
751
+ ],
752
+ "dtype": "bfloat16",
753
+ "format": "raw",
754
+ "nbytes": 1536,
755
+ "byteOffset": 16536576
756
+ },
757
+ {
758
+ "name": "roberta.encoder.layer.11.attention.self.value.weight",
759
+ "shape": [
760
+ 768,
761
+ 768
762
+ ],
763
+ "dtype": "bfloat16",
764
+ "format": "raw",
765
+ "nbytes": 1179648,
766
+ "byteOffset": 16538112
767
+ },
768
+ {
769
+ "name": "roberta.encoder.layer.11.intermediate.dense.bias",
770
+ "shape": [
771
+ 3072
772
+ ],
773
+ "dtype": "bfloat16",
774
+ "format": "raw",
775
+ "nbytes": 6144,
776
+ "byteOffset": 17717760
777
+ },
778
+ {
779
+ "name": "roberta.encoder.layer.11.intermediate.dense.weight",
780
+ "shape": [
781
+ 3072,
782
+ 768
783
+ ],
784
+ "dtype": "bfloat16",
785
+ "format": "raw",
786
+ "nbytes": 4718592,
787
+ "byteOffset": 17723904
788
+ },
789
+ {
790
+ "name": "roberta.encoder.layer.11.output.LayerNorm.bias",
791
+ "shape": [
792
+ 768
793
+ ],
794
+ "dtype": "bfloat16",
795
+ "format": "raw",
796
+ "nbytes": 1536,
797
+ "byteOffset": 22442496
798
+ },
799
+ {
800
+ "name": "roberta.encoder.layer.11.output.LayerNorm.weight",
801
+ "shape": [
802
+ 768
803
+ ],
804
+ "dtype": "bfloat16",
805
+ "format": "raw",
806
+ "nbytes": 1536,
807
+ "byteOffset": 22444032
808
+ },
809
+ {
810
+ "name": "roberta.encoder.layer.11.output.dense.bias",
811
+ "shape": [
812
+ 768
813
+ ],
814
+ "dtype": "bfloat16",
815
+ "format": "raw",
816
+ "nbytes": 1536,
817
+ "byteOffset": 22445568
818
+ },
819
+ {
820
+ "name": "roberta.encoder.layer.11.output.dense.weight",
821
+ "shape": [
822
+ 768,
823
+ 3072
824
+ ],
825
+ "dtype": "bfloat16",
826
+ "format": "raw",
827
+ "nbytes": 4718592,
828
+ "byteOffset": 22447104
829
+ },
830
+ {
831
+ "name": "roberta.encoder.layer.2.attention.output.LayerNorm.bias",
832
+ "shape": [
833
+ 768
834
+ ],
835
+ "dtype": "bfloat16",
836
+ "format": "raw",
837
+ "nbytes": 1536,
838
+ "byteOffset": 27165696
839
+ },
840
+ {
841
+ "name": "roberta.encoder.layer.2.attention.output.LayerNorm.weight",
842
+ "shape": [
843
+ 768
844
+ ],
845
+ "dtype": "bfloat16",
846
+ "format": "raw",
847
+ "nbytes": 1536,
848
+ "byteOffset": 27167232
849
+ },
850
+ {
851
+ "name": "roberta.encoder.layer.2.attention.output.dense.bias",
852
+ "shape": [
853
+ 768
854
+ ],
855
+ "dtype": "bfloat16",
856
+ "format": "raw",
857
+ "nbytes": 1536,
858
+ "byteOffset": 27168768
859
+ },
860
+ {
861
+ "name": "roberta.encoder.layer.2.attention.output.dense.weight",
862
+ "shape": [
863
+ 768,
864
+ 768
865
+ ],
866
+ "dtype": "bfloat16",
867
+ "format": "raw",
868
+ "nbytes": 1179648,
869
+ "byteOffset": 27170304
870
+ },
871
+ {
872
+ "name": "roberta.encoder.layer.2.attention.self.key.bias",
873
+ "shape": [
874
+ 768
875
+ ],
876
+ "dtype": "bfloat16",
877
+ "format": "raw",
878
+ "nbytes": 1536,
879
+ "byteOffset": 28349952
880
+ },
881
+ {
882
+ "name": "roberta.encoder.layer.2.attention.self.key.weight",
883
+ "shape": [
884
+ 768,
885
+ 768
886
+ ],
887
+ "dtype": "bfloat16",
888
+ "format": "raw",
889
+ "nbytes": 1179648,
890
+ "byteOffset": 28351488
891
+ },
892
+ {
893
+ "name": "roberta.encoder.layer.2.attention.self.query.bias",
894
+ "shape": [
895
+ 768
896
+ ],
897
+ "dtype": "bfloat16",
898
+ "format": "raw",
899
+ "nbytes": 1536,
900
+ "byteOffset": 29531136
901
+ },
902
+ {
903
+ "name": "roberta.encoder.layer.2.attention.self.query.weight",
904
+ "shape": [
905
+ 768,
906
+ 768
907
+ ],
908
+ "dtype": "bfloat16",
909
+ "format": "raw",
910
+ "nbytes": 1179648,
911
+ "byteOffset": 29532672
912
+ },
913
+ {
914
+ "name": "roberta.encoder.layer.2.attention.self.value.bias",
915
+ "shape": [
916
+ 768
917
+ ],
918
+ "dtype": "bfloat16",
919
+ "format": "raw",
920
+ "nbytes": 1536,
921
+ "byteOffset": 30712320
922
+ },
923
+ {
924
+ "name": "roberta.encoder.layer.2.attention.self.value.weight",
925
+ "shape": [
926
+ 768,
927
+ 768
928
+ ],
929
+ "dtype": "bfloat16",
930
+ "format": "raw",
931
+ "nbytes": 1179648,
932
+ "byteOffset": 30713856
933
+ },
934
+ {
935
+ "name": "roberta.encoder.layer.2.intermediate.dense.bias",
936
+ "shape": [
937
+ 3072
938
+ ],
939
+ "dtype": "bfloat16",
940
+ "format": "raw",
941
+ "nbytes": 6144,
942
+ "byteOffset": 31893504
943
+ }
944
+ ],
945
+ "md5sum": "cd967b7aa914de66a898f89fcac8f9dc"
946
+ },
947
+ {
948
+ "dataPath": "params_shard_3.bin",
949
+ "format": "raw-shard",
950
+ "nbytes": 33074688,
951
+ "records": [
952
+ {
953
+ "name": "roberta.encoder.layer.2.intermediate.dense.weight",
954
+ "shape": [
955
+ 3072,
956
+ 768
957
+ ],
958
+ "dtype": "bfloat16",
959
+ "format": "raw",
960
+ "nbytes": 4718592,
961
+ "byteOffset": 0
962
+ },
963
+ {
964
+ "name": "roberta.encoder.layer.2.output.LayerNorm.bias",
965
+ "shape": [
966
+ 768
967
+ ],
968
+ "dtype": "bfloat16",
969
+ "format": "raw",
970
+ "nbytes": 1536,
971
+ "byteOffset": 4718592
972
+ },
973
+ {
974
+ "name": "roberta.encoder.layer.2.output.LayerNorm.weight",
975
+ "shape": [
976
+ 768
977
+ ],
978
+ "dtype": "bfloat16",
979
+ "format": "raw",
980
+ "nbytes": 1536,
981
+ "byteOffset": 4720128
982
+ },
983
+ {
984
+ "name": "roberta.encoder.layer.2.output.dense.bias",
985
+ "shape": [
986
+ 768
987
+ ],
988
+ "dtype": "bfloat16",
989
+ "format": "raw",
990
+ "nbytes": 1536,
991
+ "byteOffset": 4721664
992
+ },
993
+ {
994
+ "name": "roberta.encoder.layer.2.output.dense.weight",
995
+ "shape": [
996
+ 768,
997
+ 3072
998
+ ],
999
+ "dtype": "bfloat16",
1000
+ "format": "raw",
1001
+ "nbytes": 4718592,
1002
+ "byteOffset": 4723200
1003
+ },
1004
+ {
1005
+ "name": "roberta.encoder.layer.3.attention.output.LayerNorm.bias",
1006
+ "shape": [
1007
+ 768
1008
+ ],
1009
+ "dtype": "bfloat16",
1010
+ "format": "raw",
1011
+ "nbytes": 1536,
1012
+ "byteOffset": 9441792
1013
+ },
1014
+ {
1015
+ "name": "roberta.encoder.layer.3.attention.output.LayerNorm.weight",
1016
+ "shape": [
1017
+ 768
1018
+ ],
1019
+ "dtype": "bfloat16",
1020
+ "format": "raw",
1021
+ "nbytes": 1536,
1022
+ "byteOffset": 9443328
1023
+ },
1024
+ {
1025
+ "name": "roberta.encoder.layer.3.attention.output.dense.bias",
1026
+ "shape": [
1027
+ 768
1028
+ ],
1029
+ "dtype": "bfloat16",
1030
+ "format": "raw",
1031
+ "nbytes": 1536,
1032
+ "byteOffset": 9444864
1033
+ },
1034
+ {
1035
+ "name": "roberta.encoder.layer.3.attention.output.dense.weight",
1036
+ "shape": [
1037
+ 768,
1038
+ 768
1039
+ ],
1040
+ "dtype": "bfloat16",
1041
+ "format": "raw",
1042
+ "nbytes": 1179648,
1043
+ "byteOffset": 9446400
1044
+ },
1045
+ {
1046
+ "name": "roberta.encoder.layer.3.attention.self.key.bias",
1047
+ "shape": [
1048
+ 768
1049
+ ],
1050
+ "dtype": "bfloat16",
1051
+ "format": "raw",
1052
+ "nbytes": 1536,
1053
+ "byteOffset": 10626048
1054
+ },
1055
+ {
1056
+ "name": "roberta.encoder.layer.3.attention.self.key.weight",
1057
+ "shape": [
1058
+ 768,
1059
+ 768
1060
+ ],
1061
+ "dtype": "bfloat16",
1062
+ "format": "raw",
1063
+ "nbytes": 1179648,
1064
+ "byteOffset": 10627584
1065
+ },
1066
+ {
1067
+ "name": "roberta.encoder.layer.3.attention.self.query.bias",
1068
+ "shape": [
1069
+ 768
1070
+ ],
1071
+ "dtype": "bfloat16",
1072
+ "format": "raw",
1073
+ "nbytes": 1536,
1074
+ "byteOffset": 11807232
1075
+ },
1076
+ {
1077
+ "name": "roberta.encoder.layer.3.attention.self.query.weight",
1078
+ "shape": [
1079
+ 768,
1080
+ 768
1081
+ ],
1082
+ "dtype": "bfloat16",
1083
+ "format": "raw",
1084
+ "nbytes": 1179648,
1085
+ "byteOffset": 11808768
1086
+ },
1087
+ {
1088
+ "name": "roberta.encoder.layer.3.attention.self.value.bias",
1089
+ "shape": [
1090
+ 768
1091
+ ],
1092
+ "dtype": "bfloat16",
1093
+ "format": "raw",
1094
+ "nbytes": 1536,
1095
+ "byteOffset": 12988416
1096
+ },
1097
+ {
1098
+ "name": "roberta.encoder.layer.3.attention.self.value.weight",
1099
+ "shape": [
1100
+ 768,
1101
+ 768
1102
+ ],
1103
+ "dtype": "bfloat16",
1104
+ "format": "raw",
1105
+ "nbytes": 1179648,
1106
+ "byteOffset": 12989952
1107
+ },
1108
+ {
1109
+ "name": "roberta.encoder.layer.3.intermediate.dense.bias",
1110
+ "shape": [
1111
+ 3072
1112
+ ],
1113
+ "dtype": "bfloat16",
1114
+ "format": "raw",
1115
+ "nbytes": 6144,
1116
+ "byteOffset": 14169600
1117
+ },
1118
+ {
1119
+ "name": "roberta.encoder.layer.3.intermediate.dense.weight",
1120
+ "shape": [
1121
+ 3072,
1122
+ 768
1123
+ ],
1124
+ "dtype": "bfloat16",
1125
+ "format": "raw",
1126
+ "nbytes": 4718592,
1127
+ "byteOffset": 14175744
1128
+ },
1129
+ {
1130
+ "name": "roberta.encoder.layer.3.output.LayerNorm.bias",
1131
+ "shape": [
1132
+ 768
1133
+ ],
1134
+ "dtype": "bfloat16",
1135
+ "format": "raw",
1136
+ "nbytes": 1536,
1137
+ "byteOffset": 18894336
1138
+ },
1139
+ {
1140
+ "name": "roberta.encoder.layer.3.output.LayerNorm.weight",
1141
+ "shape": [
1142
+ 768
1143
+ ],
1144
+ "dtype": "bfloat16",
1145
+ "format": "raw",
1146
+ "nbytes": 1536,
1147
+ "byteOffset": 18895872
1148
+ },
1149
+ {
1150
+ "name": "roberta.encoder.layer.3.output.dense.bias",
1151
+ "shape": [
1152
+ 768
1153
+ ],
1154
+ "dtype": "bfloat16",
1155
+ "format": "raw",
1156
+ "nbytes": 1536,
1157
+ "byteOffset": 18897408
1158
+ },
1159
+ {
1160
+ "name": "roberta.encoder.layer.3.output.dense.weight",
1161
+ "shape": [
1162
+ 768,
1163
+ 3072
1164
+ ],
1165
+ "dtype": "bfloat16",
1166
+ "format": "raw",
1167
+ "nbytes": 4718592,
1168
+ "byteOffset": 18898944
1169
+ },
1170
+ {
1171
+ "name": "roberta.encoder.layer.4.attention.output.LayerNorm.bias",
1172
+ "shape": [
1173
+ 768
1174
+ ],
1175
+ "dtype": "bfloat16",
1176
+ "format": "raw",
1177
+ "nbytes": 1536,
1178
+ "byteOffset": 23617536
1179
+ },
1180
+ {
1181
+ "name": "roberta.encoder.layer.4.attention.output.LayerNorm.weight",
1182
+ "shape": [
1183
+ 768
1184
+ ],
1185
+ "dtype": "bfloat16",
1186
+ "format": "raw",
1187
+ "nbytes": 1536,
1188
+ "byteOffset": 23619072
1189
+ },
1190
+ {
1191
+ "name": "roberta.encoder.layer.4.attention.output.dense.bias",
1192
+ "shape": [
1193
+ 768
1194
+ ],
1195
+ "dtype": "bfloat16",
1196
+ "format": "raw",
1197
+ "nbytes": 1536,
1198
+ "byteOffset": 23620608
1199
+ },
1200
+ {
1201
+ "name": "roberta.encoder.layer.4.attention.output.dense.weight",
1202
+ "shape": [
1203
+ 768,
1204
+ 768
1205
+ ],
1206
+ "dtype": "bfloat16",
1207
+ "format": "raw",
1208
+ "nbytes": 1179648,
1209
+ "byteOffset": 23622144
1210
+ },
1211
+ {
1212
+ "name": "roberta.encoder.layer.4.attention.self.key.bias",
1213
+ "shape": [
1214
+ 768
1215
+ ],
1216
+ "dtype": "bfloat16",
1217
+ "format": "raw",
1218
+ "nbytes": 1536,
1219
+ "byteOffset": 24801792
1220
+ },
1221
+ {
1222
+ "name": "roberta.encoder.layer.4.attention.self.key.weight",
1223
+ "shape": [
1224
+ 768,
1225
+ 768
1226
+ ],
1227
+ "dtype": "bfloat16",
1228
+ "format": "raw",
1229
+ "nbytes": 1179648,
1230
+ "byteOffset": 24803328
1231
+ },
1232
+ {
1233
+ "name": "roberta.encoder.layer.4.attention.self.query.bias",
1234
+ "shape": [
1235
+ 768
1236
+ ],
1237
+ "dtype": "bfloat16",
1238
+ "format": "raw",
1239
+ "nbytes": 1536,
1240
+ "byteOffset": 25982976
1241
+ },
1242
+ {
1243
+ "name": "roberta.encoder.layer.4.attention.self.query.weight",
1244
+ "shape": [
1245
+ 768,
1246
+ 768
1247
+ ],
1248
+ "dtype": "bfloat16",
1249
+ "format": "raw",
1250
+ "nbytes": 1179648,
1251
+ "byteOffset": 25984512
1252
+ },
1253
+ {
1254
+ "name": "roberta.encoder.layer.4.attention.self.value.bias",
1255
+ "shape": [
1256
+ 768
1257
+ ],
1258
+ "dtype": "bfloat16",
1259
+ "format": "raw",
1260
+ "nbytes": 1536,
1261
+ "byteOffset": 27164160
1262
+ },
1263
+ {
1264
+ "name": "roberta.encoder.layer.4.attention.self.value.weight",
1265
+ "shape": [
1266
+ 768,
1267
+ 768
1268
+ ],
1269
+ "dtype": "bfloat16",
1270
+ "format": "raw",
1271
+ "nbytes": 1179648,
1272
+ "byteOffset": 27165696
1273
+ },
1274
+ {
1275
+ "name": "roberta.encoder.layer.4.intermediate.dense.bias",
1276
+ "shape": [
1277
+ 3072
1278
+ ],
1279
+ "dtype": "bfloat16",
1280
+ "format": "raw",
1281
+ "nbytes": 6144,
1282
+ "byteOffset": 28345344
1283
+ },
1284
+ {
1285
+ "name": "roberta.encoder.layer.4.intermediate.dense.weight",
1286
+ "shape": [
1287
+ 3072,
1288
+ 768
1289
+ ],
1290
+ "dtype": "bfloat16",
1291
+ "format": "raw",
1292
+ "nbytes": 4718592,
1293
+ "byteOffset": 28351488
1294
+ },
1295
+ {
1296
+ "name": "roberta.encoder.layer.4.output.LayerNorm.bias",
1297
+ "shape": [
1298
+ 768
1299
+ ],
1300
+ "dtype": "bfloat16",
1301
+ "format": "raw",
1302
+ "nbytes": 1536,
1303
+ "byteOffset": 33070080
1304
+ },
1305
+ {
1306
+ "name": "roberta.encoder.layer.4.output.LayerNorm.weight",
1307
+ "shape": [
1308
+ 768
1309
+ ],
1310
+ "dtype": "bfloat16",
1311
+ "format": "raw",
1312
+ "nbytes": 1536,
1313
+ "byteOffset": 33071616
1314
+ },
1315
+ {
1316
+ "name": "roberta.encoder.layer.4.output.dense.bias",
1317
+ "shape": [
1318
+ 768
1319
+ ],
1320
+ "dtype": "bfloat16",
1321
+ "format": "raw",
1322
+ "nbytes": 1536,
1323
+ "byteOffset": 33073152
1324
+ }
1325
+ ],
1326
+ "md5sum": "f1598d0a023d40021c0400ee9f49844d"
1327
+ },
1328
+ {
1329
+ "dataPath": "params_shard_4.bin",
1330
+ "format": "raw-shard",
1331
+ "nbytes": 33074688,
1332
+ "records": [
1333
+ {
1334
+ "name": "roberta.encoder.layer.4.output.dense.weight",
1335
+ "shape": [
1336
+ 768,
1337
+ 3072
1338
+ ],
1339
+ "dtype": "bfloat16",
1340
+ "format": "raw",
1341
+ "nbytes": 4718592,
1342
+ "byteOffset": 0
1343
+ },
1344
+ {
1345
+ "name": "roberta.encoder.layer.5.attention.output.LayerNorm.bias",
1346
+ "shape": [
1347
+ 768
1348
+ ],
1349
+ "dtype": "bfloat16",
1350
+ "format": "raw",
1351
+ "nbytes": 1536,
1352
+ "byteOffset": 4718592
1353
+ },
1354
+ {
1355
+ "name": "roberta.encoder.layer.5.attention.output.LayerNorm.weight",
1356
+ "shape": [
1357
+ 768
1358
+ ],
1359
+ "dtype": "bfloat16",
1360
+ "format": "raw",
1361
+ "nbytes": 1536,
1362
+ "byteOffset": 4720128
1363
+ },
1364
+ {
1365
+ "name": "roberta.encoder.layer.5.attention.output.dense.bias",
1366
+ "shape": [
1367
+ 768
1368
+ ],
1369
+ "dtype": "bfloat16",
1370
+ "format": "raw",
1371
+ "nbytes": 1536,
1372
+ "byteOffset": 4721664
1373
+ },
1374
+ {
1375
+ "name": "roberta.encoder.layer.5.attention.output.dense.weight",
1376
+ "shape": [
1377
+ 768,
1378
+ 768
1379
+ ],
1380
+ "dtype": "bfloat16",
1381
+ "format": "raw",
1382
+ "nbytes": 1179648,
1383
+ "byteOffset": 4723200
1384
+ },
1385
+ {
1386
+ "name": "roberta.encoder.layer.5.attention.self.key.bias",
1387
+ "shape": [
1388
+ 768
1389
+ ],
1390
+ "dtype": "bfloat16",
1391
+ "format": "raw",
1392
+ "nbytes": 1536,
1393
+ "byteOffset": 5902848
1394
+ },
1395
+ {
1396
+ "name": "roberta.encoder.layer.5.attention.self.key.weight",
1397
+ "shape": [
1398
+ 768,
1399
+ 768
1400
+ ],
1401
+ "dtype": "bfloat16",
1402
+ "format": "raw",
1403
+ "nbytes": 1179648,
1404
+ "byteOffset": 5904384
1405
+ },
1406
+ {
1407
+ "name": "roberta.encoder.layer.5.attention.self.query.bias",
1408
+ "shape": [
1409
+ 768
1410
+ ],
1411
+ "dtype": "bfloat16",
1412
+ "format": "raw",
1413
+ "nbytes": 1536,
1414
+ "byteOffset": 7084032
1415
+ },
1416
+ {
1417
+ "name": "roberta.encoder.layer.5.attention.self.query.weight",
1418
+ "shape": [
1419
+ 768,
1420
+ 768
1421
+ ],
1422
+ "dtype": "bfloat16",
1423
+ "format": "raw",
1424
+ "nbytes": 1179648,
1425
+ "byteOffset": 7085568
1426
+ },
1427
+ {
1428
+ "name": "roberta.encoder.layer.5.attention.self.value.bias",
1429
+ "shape": [
1430
+ 768
1431
+ ],
1432
+ "dtype": "bfloat16",
1433
+ "format": "raw",
1434
+ "nbytes": 1536,
1435
+ "byteOffset": 8265216
1436
+ },
1437
+ {
1438
+ "name": "roberta.encoder.layer.5.attention.self.value.weight",
1439
+ "shape": [
1440
+ 768,
1441
+ 768
1442
+ ],
1443
+ "dtype": "bfloat16",
1444
+ "format": "raw",
1445
+ "nbytes": 1179648,
1446
+ "byteOffset": 8266752
1447
+ },
1448
+ {
1449
+ "name": "roberta.encoder.layer.5.intermediate.dense.bias",
1450
+ "shape": [
1451
+ 3072
1452
+ ],
1453
+ "dtype": "bfloat16",
1454
+ "format": "raw",
1455
+ "nbytes": 6144,
1456
+ "byteOffset": 9446400
1457
+ },
1458
+ {
1459
+ "name": "roberta.encoder.layer.5.intermediate.dense.weight",
1460
+ "shape": [
1461
+ 3072,
1462
+ 768
1463
+ ],
1464
+ "dtype": "bfloat16",
1465
+ "format": "raw",
1466
+ "nbytes": 4718592,
1467
+ "byteOffset": 9452544
1468
+ },
1469
+ {
1470
+ "name": "roberta.encoder.layer.5.output.LayerNorm.bias",
1471
+ "shape": [
1472
+ 768
1473
+ ],
1474
+ "dtype": "bfloat16",
1475
+ "format": "raw",
1476
+ "nbytes": 1536,
1477
+ "byteOffset": 14171136
1478
+ },
1479
+ {
1480
+ "name": "roberta.encoder.layer.5.output.LayerNorm.weight",
1481
+ "shape": [
1482
+ 768
1483
+ ],
1484
+ "dtype": "bfloat16",
1485
+ "format": "raw",
1486
+ "nbytes": 1536,
1487
+ "byteOffset": 14172672
1488
+ },
1489
+ {
1490
+ "name": "roberta.encoder.layer.5.output.dense.bias",
1491
+ "shape": [
1492
+ 768
1493
+ ],
1494
+ "dtype": "bfloat16",
1495
+ "format": "raw",
1496
+ "nbytes": 1536,
1497
+ "byteOffset": 14174208
1498
+ },
1499
+ {
1500
+ "name": "roberta.encoder.layer.5.output.dense.weight",
1501
+ "shape": [
1502
+ 768,
1503
+ 3072
1504
+ ],
1505
+ "dtype": "bfloat16",
1506
+ "format": "raw",
1507
+ "nbytes": 4718592,
1508
+ "byteOffset": 14175744
1509
+ },
1510
+ {
1511
+ "name": "roberta.encoder.layer.6.attention.output.LayerNorm.bias",
1512
+ "shape": [
1513
+ 768
1514
+ ],
1515
+ "dtype": "bfloat16",
1516
+ "format": "raw",
1517
+ "nbytes": 1536,
1518
+ "byteOffset": 18894336
1519
+ },
1520
+ {
1521
+ "name": "roberta.encoder.layer.6.attention.output.LayerNorm.weight",
1522
+ "shape": [
1523
+ 768
1524
+ ],
1525
+ "dtype": "bfloat16",
1526
+ "format": "raw",
1527
+ "nbytes": 1536,
1528
+ "byteOffset": 18895872
1529
+ },
1530
+ {
1531
+ "name": "roberta.encoder.layer.6.attention.output.dense.bias",
1532
+ "shape": [
1533
+ 768
1534
+ ],
1535
+ "dtype": "bfloat16",
1536
+ "format": "raw",
1537
+ "nbytes": 1536,
1538
+ "byteOffset": 18897408
1539
+ },
1540
+ {
1541
+ "name": "roberta.encoder.layer.6.attention.output.dense.weight",
1542
+ "shape": [
1543
+ 768,
1544
+ 768
1545
+ ],
1546
+ "dtype": "bfloat16",
1547
+ "format": "raw",
1548
+ "nbytes": 1179648,
1549
+ "byteOffset": 18898944
1550
+ },
1551
+ {
1552
+ "name": "roberta.encoder.layer.6.attention.self.key.bias",
1553
+ "shape": [
1554
+ 768
1555
+ ],
1556
+ "dtype": "bfloat16",
1557
+ "format": "raw",
1558
+ "nbytes": 1536,
1559
+ "byteOffset": 20078592
1560
+ },
1561
+ {
1562
+ "name": "roberta.encoder.layer.6.attention.self.key.weight",
1563
+ "shape": [
1564
+ 768,
1565
+ 768
1566
+ ],
1567
+ "dtype": "bfloat16",
1568
+ "format": "raw",
1569
+ "nbytes": 1179648,
1570
+ "byteOffset": 20080128
1571
+ },
1572
+ {
1573
+ "name": "roberta.encoder.layer.6.attention.self.query.bias",
1574
+ "shape": [
1575
+ 768
1576
+ ],
1577
+ "dtype": "bfloat16",
1578
+ "format": "raw",
1579
+ "nbytes": 1536,
1580
+ "byteOffset": 21259776
1581
+ },
1582
+ {
1583
+ "name": "roberta.encoder.layer.6.attention.self.query.weight",
1584
+ "shape": [
1585
+ 768,
1586
+ 768
1587
+ ],
1588
+ "dtype": "bfloat16",
1589
+ "format": "raw",
1590
+ "nbytes": 1179648,
1591
+ "byteOffset": 21261312
1592
+ },
1593
+ {
1594
+ "name": "roberta.encoder.layer.6.attention.self.value.bias",
1595
+ "shape": [
1596
+ 768
1597
+ ],
1598
+ "dtype": "bfloat16",
1599
+ "format": "raw",
1600
+ "nbytes": 1536,
1601
+ "byteOffset": 22440960
1602
+ },
1603
+ {
1604
+ "name": "roberta.encoder.layer.6.attention.self.value.weight",
1605
+ "shape": [
1606
+ 768,
1607
+ 768
1608
+ ],
1609
+ "dtype": "bfloat16",
1610
+ "format": "raw",
1611
+ "nbytes": 1179648,
1612
+ "byteOffset": 22442496
1613
+ },
1614
+ {
1615
+ "name": "roberta.encoder.layer.6.intermediate.dense.bias",
1616
+ "shape": [
1617
+ 3072
1618
+ ],
1619
+ "dtype": "bfloat16",
1620
+ "format": "raw",
1621
+ "nbytes": 6144,
1622
+ "byteOffset": 23622144
1623
+ },
1624
+ {
1625
+ "name": "roberta.encoder.layer.6.intermediate.dense.weight",
1626
+ "shape": [
1627
+ 3072,
1628
+ 768
1629
+ ],
1630
+ "dtype": "bfloat16",
1631
+ "format": "raw",
1632
+ "nbytes": 4718592,
1633
+ "byteOffset": 23628288
1634
+ },
1635
+ {
1636
+ "name": "roberta.encoder.layer.6.output.LayerNorm.bias",
1637
+ "shape": [
1638
+ 768
1639
+ ],
1640
+ "dtype": "bfloat16",
1641
+ "format": "raw",
1642
+ "nbytes": 1536,
1643
+ "byteOffset": 28346880
1644
+ },
1645
+ {
1646
+ "name": "roberta.encoder.layer.6.output.LayerNorm.weight",
1647
+ "shape": [
1648
+ 768
1649
+ ],
1650
+ "dtype": "bfloat16",
1651
+ "format": "raw",
1652
+ "nbytes": 1536,
1653
+ "byteOffset": 28348416
1654
+ },
1655
+ {
1656
+ "name": "roberta.encoder.layer.6.output.dense.bias",
1657
+ "shape": [
1658
+ 768
1659
+ ],
1660
+ "dtype": "bfloat16",
1661
+ "format": "raw",
1662
+ "nbytes": 1536,
1663
+ "byteOffset": 28349952
1664
+ },
1665
+ {
1666
+ "name": "roberta.encoder.layer.6.output.dense.weight",
1667
+ "shape": [
1668
+ 768,
1669
+ 3072
1670
+ ],
1671
+ "dtype": "bfloat16",
1672
+ "format": "raw",
1673
+ "nbytes": 4718592,
1674
+ "byteOffset": 28351488
1675
+ },
1676
+ {
1677
+ "name": "roberta.encoder.layer.7.attention.output.LayerNorm.bias",
1678
+ "shape": [
1679
+ 768
1680
+ ],
1681
+ "dtype": "bfloat16",
1682
+ "format": "raw",
1683
+ "nbytes": 1536,
1684
+ "byteOffset": 33070080
1685
+ },
1686
+ {
1687
+ "name": "roberta.encoder.layer.7.attention.output.LayerNorm.weight",
1688
+ "shape": [
1689
+ 768
1690
+ ],
1691
+ "dtype": "bfloat16",
1692
+ "format": "raw",
1693
+ "nbytes": 1536,
1694
+ "byteOffset": 33071616
1695
+ },
1696
+ {
1697
+ "name": "roberta.encoder.layer.7.attention.output.dense.bias",
1698
+ "shape": [
1699
+ 768
1700
+ ],
1701
+ "dtype": "bfloat16",
1702
+ "format": "raw",
1703
+ "nbytes": 1536,
1704
+ "byteOffset": 33073152
1705
+ }
1706
+ ],
1707
+ "md5sum": "d7db2ff4e87fb3b13f5a0f5e9c00d5e9"
1708
+ },
1709
+ {
1710
+ "dataPath": "params_shard_5.bin",
1711
+ "format": "raw-shard",
1712
+ "nbytes": 33080832,
1713
+ "records": [
1714
+ {
1715
+ "name": "roberta.encoder.layer.7.attention.output.dense.weight",
1716
+ "shape": [
1717
+ 768,
1718
+ 768
1719
+ ],
1720
+ "dtype": "bfloat16",
1721
+ "format": "raw",
1722
+ "nbytes": 1179648,
1723
+ "byteOffset": 0
1724
+ },
1725
+ {
1726
+ "name": "roberta.encoder.layer.7.attention.self.key.bias",
1727
+ "shape": [
1728
+ 768
1729
+ ],
1730
+ "dtype": "bfloat16",
1731
+ "format": "raw",
1732
+ "nbytes": 1536,
1733
+ "byteOffset": 1179648
1734
+ },
1735
+ {
1736
+ "name": "roberta.encoder.layer.7.attention.self.key.weight",
1737
+ "shape": [
1738
+ 768,
1739
+ 768
1740
+ ],
1741
+ "dtype": "bfloat16",
1742
+ "format": "raw",
1743
+ "nbytes": 1179648,
1744
+ "byteOffset": 1181184
1745
+ },
1746
+ {
1747
+ "name": "roberta.encoder.layer.7.attention.self.query.bias",
1748
+ "shape": [
1749
+ 768
1750
+ ],
1751
+ "dtype": "bfloat16",
1752
+ "format": "raw",
1753
+ "nbytes": 1536,
1754
+ "byteOffset": 2360832
1755
+ },
1756
+ {
1757
+ "name": "roberta.encoder.layer.7.attention.self.query.weight",
1758
+ "shape": [
1759
+ 768,
1760
+ 768
1761
+ ],
1762
+ "dtype": "bfloat16",
1763
+ "format": "raw",
1764
+ "nbytes": 1179648,
1765
+ "byteOffset": 2362368
1766
+ },
1767
+ {
1768
+ "name": "roberta.encoder.layer.7.attention.self.value.bias",
1769
+ "shape": [
1770
+ 768
1771
+ ],
1772
+ "dtype": "bfloat16",
1773
+ "format": "raw",
1774
+ "nbytes": 1536,
1775
+ "byteOffset": 3542016
1776
+ },
1777
+ {
1778
+ "name": "roberta.encoder.layer.7.attention.self.value.weight",
1779
+ "shape": [
1780
+ 768,
1781
+ 768
1782
+ ],
1783
+ "dtype": "bfloat16",
1784
+ "format": "raw",
1785
+ "nbytes": 1179648,
1786
+ "byteOffset": 3543552
1787
+ },
1788
+ {
1789
+ "name": "roberta.encoder.layer.7.intermediate.dense.bias",
1790
+ "shape": [
1791
+ 3072
1792
+ ],
1793
+ "dtype": "bfloat16",
1794
+ "format": "raw",
1795
+ "nbytes": 6144,
1796
+ "byteOffset": 4723200
1797
+ },
1798
+ {
1799
+ "name": "roberta.encoder.layer.7.intermediate.dense.weight",
1800
+ "shape": [
1801
+ 3072,
1802
+ 768
1803
+ ],
1804
+ "dtype": "bfloat16",
1805
+ "format": "raw",
1806
+ "nbytes": 4718592,
1807
+ "byteOffset": 4729344
1808
+ },
1809
+ {
1810
+ "name": "roberta.encoder.layer.7.output.LayerNorm.bias",
1811
+ "shape": [
1812
+ 768
1813
+ ],
1814
+ "dtype": "bfloat16",
1815
+ "format": "raw",
1816
+ "nbytes": 1536,
1817
+ "byteOffset": 9447936
1818
+ },
1819
+ {
1820
+ "name": "roberta.encoder.layer.7.output.LayerNorm.weight",
1821
+ "shape": [
1822
+ 768
1823
+ ],
1824
+ "dtype": "bfloat16",
1825
+ "format": "raw",
1826
+ "nbytes": 1536,
1827
+ "byteOffset": 9449472
1828
+ },
1829
+ {
1830
+ "name": "roberta.encoder.layer.7.output.dense.bias",
1831
+ "shape": [
1832
+ 768
1833
+ ],
1834
+ "dtype": "bfloat16",
1835
+ "format": "raw",
1836
+ "nbytes": 1536,
1837
+ "byteOffset": 9451008
1838
+ },
1839
+ {
1840
+ "name": "roberta.encoder.layer.7.output.dense.weight",
1841
+ "shape": [
1842
+ 768,
1843
+ 3072
1844
+ ],
1845
+ "dtype": "bfloat16",
1846
+ "format": "raw",
1847
+ "nbytes": 4718592,
1848
+ "byteOffset": 9452544
1849
+ },
1850
+ {
1851
+ "name": "roberta.encoder.layer.8.attention.output.LayerNorm.bias",
1852
+ "shape": [
1853
+ 768
1854
+ ],
1855
+ "dtype": "bfloat16",
1856
+ "format": "raw",
1857
+ "nbytes": 1536,
1858
+ "byteOffset": 14171136
1859
+ },
1860
+ {
1861
+ "name": "roberta.encoder.layer.8.attention.output.LayerNorm.weight",
1862
+ "shape": [
1863
+ 768
1864
+ ],
1865
+ "dtype": "bfloat16",
1866
+ "format": "raw",
1867
+ "nbytes": 1536,
1868
+ "byteOffset": 14172672
1869
+ },
1870
+ {
1871
+ "name": "roberta.encoder.layer.8.attention.output.dense.bias",
1872
+ "shape": [
1873
+ 768
1874
+ ],
1875
+ "dtype": "bfloat16",
1876
+ "format": "raw",
1877
+ "nbytes": 1536,
1878
+ "byteOffset": 14174208
1879
+ },
1880
+ {
1881
+ "name": "roberta.encoder.layer.8.attention.output.dense.weight",
1882
+ "shape": [
1883
+ 768,
1884
+ 768
1885
+ ],
1886
+ "dtype": "bfloat16",
1887
+ "format": "raw",
1888
+ "nbytes": 1179648,
1889
+ "byteOffset": 14175744
1890
+ },
1891
+ {
1892
+ "name": "roberta.encoder.layer.8.attention.self.key.bias",
1893
+ "shape": [
1894
+ 768
1895
+ ],
1896
+ "dtype": "bfloat16",
1897
+ "format": "raw",
1898
+ "nbytes": 1536,
1899
+ "byteOffset": 15355392
1900
+ },
1901
+ {
1902
+ "name": "roberta.encoder.layer.8.attention.self.key.weight",
1903
+ "shape": [
1904
+ 768,
1905
+ 768
1906
+ ],
1907
+ "dtype": "bfloat16",
1908
+ "format": "raw",
1909
+ "nbytes": 1179648,
1910
+ "byteOffset": 15356928
1911
+ },
1912
+ {
1913
+ "name": "roberta.encoder.layer.8.attention.self.query.bias",
1914
+ "shape": [
1915
+ 768
1916
+ ],
1917
+ "dtype": "bfloat16",
1918
+ "format": "raw",
1919
+ "nbytes": 1536,
1920
+ "byteOffset": 16536576
1921
+ },
1922
+ {
1923
+ "name": "roberta.encoder.layer.8.attention.self.query.weight",
1924
+ "shape": [
1925
+ 768,
1926
+ 768
1927
+ ],
1928
+ "dtype": "bfloat16",
1929
+ "format": "raw",
1930
+ "nbytes": 1179648,
1931
+ "byteOffset": 16538112
1932
+ },
1933
+ {
1934
+ "name": "roberta.encoder.layer.8.attention.self.value.bias",
1935
+ "shape": [
1936
+ 768
1937
+ ],
1938
+ "dtype": "bfloat16",
1939
+ "format": "raw",
1940
+ "nbytes": 1536,
1941
+ "byteOffset": 17717760
1942
+ },
1943
+ {
1944
+ "name": "roberta.encoder.layer.8.attention.self.value.weight",
1945
+ "shape": [
1946
+ 768,
1947
+ 768
1948
+ ],
1949
+ "dtype": "bfloat16",
1950
+ "format": "raw",
1951
+ "nbytes": 1179648,
1952
+ "byteOffset": 17719296
1953
+ },
1954
+ {
1955
+ "name": "roberta.encoder.layer.8.intermediate.dense.bias",
1956
+ "shape": [
1957
+ 3072
1958
+ ],
1959
+ "dtype": "bfloat16",
1960
+ "format": "raw",
1961
+ "nbytes": 6144,
1962
+ "byteOffset": 18898944
1963
+ },
1964
+ {
1965
+ "name": "roberta.encoder.layer.8.intermediate.dense.weight",
1966
+ "shape": [
1967
+ 3072,
1968
+ 768
1969
+ ],
1970
+ "dtype": "bfloat16",
1971
+ "format": "raw",
1972
+ "nbytes": 4718592,
1973
+ "byteOffset": 18905088
1974
+ },
1975
+ {
1976
+ "name": "roberta.encoder.layer.8.output.LayerNorm.bias",
1977
+ "shape": [
1978
+ 768
1979
+ ],
1980
+ "dtype": "bfloat16",
1981
+ "format": "raw",
1982
+ "nbytes": 1536,
1983
+ "byteOffset": 23623680
1984
+ },
1985
+ {
1986
+ "name": "roberta.encoder.layer.8.output.LayerNorm.weight",
1987
+ "shape": [
1988
+ 768
1989
+ ],
1990
+ "dtype": "bfloat16",
1991
+ "format": "raw",
1992
+ "nbytes": 1536,
1993
+ "byteOffset": 23625216
1994
+ },
1995
+ {
1996
+ "name": "roberta.encoder.layer.8.output.dense.bias",
1997
+ "shape": [
1998
+ 768
1999
+ ],
2000
+ "dtype": "bfloat16",
2001
+ "format": "raw",
2002
+ "nbytes": 1536,
2003
+ "byteOffset": 23626752
2004
+ },
2005
+ {
2006
+ "name": "roberta.encoder.layer.8.output.dense.weight",
2007
+ "shape": [
2008
+ 768,
2009
+ 3072
2010
+ ],
2011
+ "dtype": "bfloat16",
2012
+ "format": "raw",
2013
+ "nbytes": 4718592,
2014
+ "byteOffset": 23628288
2015
+ },
2016
+ {
2017
+ "name": "roberta.encoder.layer.9.attention.output.LayerNorm.bias",
2018
+ "shape": [
2019
+ 768
2020
+ ],
2021
+ "dtype": "bfloat16",
2022
+ "format": "raw",
2023
+ "nbytes": 1536,
2024
+ "byteOffset": 28346880
2025
+ },
2026
+ {
2027
+ "name": "roberta.encoder.layer.9.attention.output.LayerNorm.weight",
2028
+ "shape": [
2029
+ 768
2030
+ ],
2031
+ "dtype": "bfloat16",
2032
+ "format": "raw",
2033
+ "nbytes": 1536,
2034
+ "byteOffset": 28348416
2035
+ },
2036
+ {
2037
+ "name": "roberta.encoder.layer.9.attention.output.dense.bias",
2038
+ "shape": [
2039
+ 768
2040
+ ],
2041
+ "dtype": "bfloat16",
2042
+ "format": "raw",
2043
+ "nbytes": 1536,
2044
+ "byteOffset": 28349952
2045
+ },
2046
+ {
2047
+ "name": "roberta.encoder.layer.9.attention.output.dense.weight",
2048
+ "shape": [
2049
+ 768,
2050
+ 768
2051
+ ],
2052
+ "dtype": "bfloat16",
2053
+ "format": "raw",
2054
+ "nbytes": 1179648,
2055
+ "byteOffset": 28351488
2056
+ },
2057
+ {
2058
+ "name": "roberta.encoder.layer.9.attention.self.key.bias",
2059
+ "shape": [
2060
+ 768
2061
+ ],
2062
+ "dtype": "bfloat16",
2063
+ "format": "raw",
2064
+ "nbytes": 1536,
2065
+ "byteOffset": 29531136
2066
+ },
2067
+ {
2068
+ "name": "roberta.encoder.layer.9.attention.self.key.weight",
2069
+ "shape": [
2070
+ 768,
2071
+ 768
2072
+ ],
2073
+ "dtype": "bfloat16",
2074
+ "format": "raw",
2075
+ "nbytes": 1179648,
2076
+ "byteOffset": 29532672
2077
+ },
2078
+ {
2079
+ "name": "roberta.encoder.layer.9.attention.self.query.bias",
2080
+ "shape": [
2081
+ 768
2082
+ ],
2083
+ "dtype": "bfloat16",
2084
+ "format": "raw",
2085
+ "nbytes": 1536,
2086
+ "byteOffset": 30712320
2087
+ },
2088
+ {
2089
+ "name": "roberta.encoder.layer.9.attention.self.query.weight",
2090
+ "shape": [
2091
+ 768,
2092
+ 768
2093
+ ],
2094
+ "dtype": "bfloat16",
2095
+ "format": "raw",
2096
+ "nbytes": 1179648,
2097
+ "byteOffset": 30713856
2098
+ },
2099
+ {
2100
+ "name": "roberta.encoder.layer.9.attention.self.value.bias",
2101
+ "shape": [
2102
+ 768
2103
+ ],
2104
+ "dtype": "bfloat16",
2105
+ "format": "raw",
2106
+ "nbytes": 1536,
2107
+ "byteOffset": 31893504
2108
+ },
2109
+ {
2110
+ "name": "roberta.encoder.layer.9.attention.self.value.weight",
2111
+ "shape": [
2112
+ 768,
2113
+ 768
2114
+ ],
2115
+ "dtype": "bfloat16",
2116
+ "format": "raw",
2117
+ "nbytes": 1179648,
2118
+ "byteOffset": 31895040
2119
+ },
2120
+ {
2121
+ "name": "roberta.encoder.layer.9.intermediate.dense.bias",
2122
+ "shape": [
2123
+ 3072
2124
+ ],
2125
+ "dtype": "bfloat16",
2126
+ "format": "raw",
2127
+ "nbytes": 6144,
2128
+ "byteOffset": 33074688
2129
+ }
2130
+ ],
2131
+ "md5sum": "16e5fe6cb9c4b8dfe68170b6b4a350be"
2132
+ },
2133
+ {
2134
+ "dataPath": "params_shard_6.bin",
2135
+ "format": "raw-shard",
2136
+ "nbytes": 9441792,
2137
+ "records": [
2138
+ {
2139
+ "name": "roberta.encoder.layer.9.intermediate.dense.weight",
2140
+ "shape": [
2141
+ 3072,
2142
+ 768
2143
+ ],
2144
+ "dtype": "bfloat16",
2145
+ "format": "raw",
2146
+ "nbytes": 4718592,
2147
+ "byteOffset": 0
2148
+ },
2149
+ {
2150
+ "name": "roberta.encoder.layer.9.output.LayerNorm.bias",
2151
+ "shape": [
2152
+ 768
2153
+ ],
2154
+ "dtype": "bfloat16",
2155
+ "format": "raw",
2156
+ "nbytes": 1536,
2157
+ "byteOffset": 4718592
2158
+ },
2159
+ {
2160
+ "name": "roberta.encoder.layer.9.output.LayerNorm.weight",
2161
+ "shape": [
2162
+ 768
2163
+ ],
2164
+ "dtype": "bfloat16",
2165
+ "format": "raw",
2166
+ "nbytes": 1536,
2167
+ "byteOffset": 4720128
2168
+ },
2169
+ {
2170
+ "name": "roberta.encoder.layer.9.output.dense.bias",
2171
+ "shape": [
2172
+ 768
2173
+ ],
2174
+ "dtype": "bfloat16",
2175
+ "format": "raw",
2176
+ "nbytes": 1536,
2177
+ "byteOffset": 4721664
2178
+ },
2179
+ {
2180
+ "name": "roberta.encoder.layer.9.output.dense.weight",
2181
+ "shape": [
2182
+ 768,
2183
+ 3072
2184
+ ],
2185
+ "dtype": "bfloat16",
2186
+ "format": "raw",
2187
+ "nbytes": 4718592,
2188
+ "byteOffset": 4723200
2189
+ }
2190
+ ],
2191
+ "md5sum": "5882c18b7fb291d522cc147ec14e7014"
2192
+ }
2193
+ ]
2194
+ }
ndarray-cache.json ADDED
@@ -0,0 +1,2194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 205,
4
+ "ParamBytes": 500957200.0,
5
+ "BitsPerParam": 32.0
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 77207040,
12
+ "records": [
13
+ {
14
+ "name": "roberta.embeddings.word_embeddings.weight",
15
+ "shape": [
16
+ 50265,
17
+ 768
18
+ ],
19
+ "dtype": "float32",
20
+ "format": "f32-to-bf16",
21
+ "nbytes": 77207040,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "822cb1e659d209d0c4ab502aa588a4a9"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 32699912,
31
+ "records": [
32
+ {
33
+ "name": "latency_classifier.dense.bias",
34
+ "shape": [
35
+ 768
36
+ ],
37
+ "dtype": "float32",
38
+ "format": "f32-to-bf16",
39
+ "nbytes": 1536,
40
+ "byteOffset": 0
41
+ },
42
+ {
43
+ "name": "latency_classifier.dense.weight",
44
+ "shape": [
45
+ 768,
46
+ 768
47
+ ],
48
+ "dtype": "float32",
49
+ "format": "f32-to-bf16",
50
+ "nbytes": 1179648,
51
+ "byteOffset": 1536
52
+ },
53
+ {
54
+ "name": "latency_classifier.out_proj.bias",
55
+ "shape": [
56
+ 2
57
+ ],
58
+ "dtype": "float32",
59
+ "format": "f32-to-bf16",
60
+ "nbytes": 4,
61
+ "byteOffset": 1181184
62
+ },
63
+ {
64
+ "name": "latency_classifier.out_proj.weight",
65
+ "shape": [
66
+ 2,
67
+ 768
68
+ ],
69
+ "dtype": "float32",
70
+ "format": "f32-to-bf16",
71
+ "nbytes": 3072,
72
+ "byteOffset": 1181188
73
+ },
74
+ {
75
+ "name": "quality_classifier.dense.bias",
76
+ "shape": [
77
+ 768
78
+ ],
79
+ "dtype": "float32",
80
+ "format": "f32-to-bf16",
81
+ "nbytes": 1536,
82
+ "byteOffset": 1184260
83
+ },
84
+ {
85
+ "name": "quality_classifier.dense.weight",
86
+ "shape": [
87
+ 768,
88
+ 768
89
+ ],
90
+ "dtype": "float32",
91
+ "format": "f32-to-bf16",
92
+ "nbytes": 1179648,
93
+ "byteOffset": 1185796
94
+ },
95
+ {
96
+ "name": "quality_classifier.out_proj.bias",
97
+ "shape": [
98
+ 2
99
+ ],
100
+ "dtype": "float32",
101
+ "format": "f32-to-bf16",
102
+ "nbytes": 4,
103
+ "byteOffset": 2365444
104
+ },
105
+ {
106
+ "name": "quality_classifier.out_proj.weight",
107
+ "shape": [
108
+ 2,
109
+ 768
110
+ ],
111
+ "dtype": "float32",
112
+ "format": "f32-to-bf16",
113
+ "nbytes": 3072,
114
+ "byteOffset": 2365448
115
+ },
116
+ {
117
+ "name": "roberta.embeddings.LayerNorm.bias",
118
+ "shape": [
119
+ 768
120
+ ],
121
+ "dtype": "float32",
122
+ "format": "f32-to-bf16",
123
+ "nbytes": 1536,
124
+ "byteOffset": 2368520
125
+ },
126
+ {
127
+ "name": "roberta.embeddings.LayerNorm.weight",
128
+ "shape": [
129
+ 768
130
+ ],
131
+ "dtype": "float32",
132
+ "format": "f32-to-bf16",
133
+ "nbytes": 1536,
134
+ "byteOffset": 2370056
135
+ },
136
+ {
137
+ "name": "roberta.embeddings.position_embeddings.weight",
138
+ "shape": [
139
+ 514,
140
+ 768
141
+ ],
142
+ "dtype": "float32",
143
+ "format": "f32-to-bf16",
144
+ "nbytes": 789504,
145
+ "byteOffset": 2371592
146
+ },
147
+ {
148
+ "name": "roberta.embeddings.token_type_embeddings.weight",
149
+ "shape": [
150
+ 1,
151
+ 768
152
+ ],
153
+ "dtype": "float32",
154
+ "format": "f32-to-bf16",
155
+ "nbytes": 1536,
156
+ "byteOffset": 3161096
157
+ },
158
+ {
159
+ "name": "roberta.encoder.layer.0.attention.output.LayerNorm.bias",
160
+ "shape": [
161
+ 768
162
+ ],
163
+ "dtype": "float32",
164
+ "format": "f32-to-bf16",
165
+ "nbytes": 1536,
166
+ "byteOffset": 3162632
167
+ },
168
+ {
169
+ "name": "roberta.encoder.layer.0.attention.output.LayerNorm.weight",
170
+ "shape": [
171
+ 768
172
+ ],
173
+ "dtype": "float32",
174
+ "format": "f32-to-bf16",
175
+ "nbytes": 1536,
176
+ "byteOffset": 3164168
177
+ },
178
+ {
179
+ "name": "roberta.encoder.layer.0.attention.output.dense.bias",
180
+ "shape": [
181
+ 768
182
+ ],
183
+ "dtype": "float32",
184
+ "format": "f32-to-bf16",
185
+ "nbytes": 1536,
186
+ "byteOffset": 3165704
187
+ },
188
+ {
189
+ "name": "roberta.encoder.layer.0.attention.output.dense.weight",
190
+ "shape": [
191
+ 768,
192
+ 768
193
+ ],
194
+ "dtype": "float32",
195
+ "format": "f32-to-bf16",
196
+ "nbytes": 1179648,
197
+ "byteOffset": 3167240
198
+ },
199
+ {
200
+ "name": "roberta.encoder.layer.0.attention.self.key.bias",
201
+ "shape": [
202
+ 768
203
+ ],
204
+ "dtype": "float32",
205
+ "format": "f32-to-bf16",
206
+ "nbytes": 1536,
207
+ "byteOffset": 4346888
208
+ },
209
+ {
210
+ "name": "roberta.encoder.layer.0.attention.self.key.weight",
211
+ "shape": [
212
+ 768,
213
+ 768
214
+ ],
215
+ "dtype": "float32",
216
+ "format": "f32-to-bf16",
217
+ "nbytes": 1179648,
218
+ "byteOffset": 4348424
219
+ },
220
+ {
221
+ "name": "roberta.encoder.layer.0.attention.self.query.bias",
222
+ "shape": [
223
+ 768
224
+ ],
225
+ "dtype": "float32",
226
+ "format": "f32-to-bf16",
227
+ "nbytes": 1536,
228
+ "byteOffset": 5528072
229
+ },
230
+ {
231
+ "name": "roberta.encoder.layer.0.attention.self.query.weight",
232
+ "shape": [
233
+ 768,
234
+ 768
235
+ ],
236
+ "dtype": "float32",
237
+ "format": "f32-to-bf16",
238
+ "nbytes": 1179648,
239
+ "byteOffset": 5529608
240
+ },
241
+ {
242
+ "name": "roberta.encoder.layer.0.attention.self.value.bias",
243
+ "shape": [
244
+ 768
245
+ ],
246
+ "dtype": "float32",
247
+ "format": "f32-to-bf16",
248
+ "nbytes": 1536,
249
+ "byteOffset": 6709256
250
+ },
251
+ {
252
+ "name": "roberta.encoder.layer.0.attention.self.value.weight",
253
+ "shape": [
254
+ 768,
255
+ 768
256
+ ],
257
+ "dtype": "float32",
258
+ "format": "f32-to-bf16",
259
+ "nbytes": 1179648,
260
+ "byteOffset": 6710792
261
+ },
262
+ {
263
+ "name": "roberta.encoder.layer.0.intermediate.dense.bias",
264
+ "shape": [
265
+ 3072
266
+ ],
267
+ "dtype": "float32",
268
+ "format": "f32-to-bf16",
269
+ "nbytes": 6144,
270
+ "byteOffset": 7890440
271
+ },
272
+ {
273
+ "name": "roberta.encoder.layer.0.intermediate.dense.weight",
274
+ "shape": [
275
+ 3072,
276
+ 768
277
+ ],
278
+ "dtype": "float32",
279
+ "format": "f32-to-bf16",
280
+ "nbytes": 4718592,
281
+ "byteOffset": 7896584
282
+ },
283
+ {
284
+ "name": "roberta.encoder.layer.0.output.LayerNorm.bias",
285
+ "shape": [
286
+ 768
287
+ ],
288
+ "dtype": "float32",
289
+ "format": "f32-to-bf16",
290
+ "nbytes": 1536,
291
+ "byteOffset": 12615176
292
+ },
293
+ {
294
+ "name": "roberta.encoder.layer.0.output.LayerNorm.weight",
295
+ "shape": [
296
+ 768
297
+ ],
298
+ "dtype": "float32",
299
+ "format": "f32-to-bf16",
300
+ "nbytes": 1536,
301
+ "byteOffset": 12616712
302
+ },
303
+ {
304
+ "name": "roberta.encoder.layer.0.output.dense.bias",
305
+ "shape": [
306
+ 768
307
+ ],
308
+ "dtype": "float32",
309
+ "format": "f32-to-bf16",
310
+ "nbytes": 1536,
311
+ "byteOffset": 12618248
312
+ },
313
+ {
314
+ "name": "roberta.encoder.layer.0.output.dense.weight",
315
+ "shape": [
316
+ 768,
317
+ 3072
318
+ ],
319
+ "dtype": "float32",
320
+ "format": "f32-to-bf16",
321
+ "nbytes": 4718592,
322
+ "byteOffset": 12619784
323
+ },
324
+ {
325
+ "name": "roberta.encoder.layer.1.attention.output.LayerNorm.bias",
326
+ "shape": [
327
+ 768
328
+ ],
329
+ "dtype": "float32",
330
+ "format": "f32-to-bf16",
331
+ "nbytes": 1536,
332
+ "byteOffset": 17338376
333
+ },
334
+ {
335
+ "name": "roberta.encoder.layer.1.attention.output.LayerNorm.weight",
336
+ "shape": [
337
+ 768
338
+ ],
339
+ "dtype": "float32",
340
+ "format": "f32-to-bf16",
341
+ "nbytes": 1536,
342
+ "byteOffset": 17339912
343
+ },
344
+ {
345
+ "name": "roberta.encoder.layer.1.attention.output.dense.bias",
346
+ "shape": [
347
+ 768
348
+ ],
349
+ "dtype": "float32",
350
+ "format": "f32-to-bf16",
351
+ "nbytes": 1536,
352
+ "byteOffset": 17341448
353
+ },
354
+ {
355
+ "name": "roberta.encoder.layer.1.attention.output.dense.weight",
356
+ "shape": [
357
+ 768,
358
+ 768
359
+ ],
360
+ "dtype": "float32",
361
+ "format": "f32-to-bf16",
362
+ "nbytes": 1179648,
363
+ "byteOffset": 17342984
364
+ },
365
+ {
366
+ "name": "roberta.encoder.layer.1.attention.self.key.bias",
367
+ "shape": [
368
+ 768
369
+ ],
370
+ "dtype": "float32",
371
+ "format": "f32-to-bf16",
372
+ "nbytes": 1536,
373
+ "byteOffset": 18522632
374
+ },
375
+ {
376
+ "name": "roberta.encoder.layer.1.attention.self.key.weight",
377
+ "shape": [
378
+ 768,
379
+ 768
380
+ ],
381
+ "dtype": "float32",
382
+ "format": "f32-to-bf16",
383
+ "nbytes": 1179648,
384
+ "byteOffset": 18524168
385
+ },
386
+ {
387
+ "name": "roberta.encoder.layer.1.attention.self.query.bias",
388
+ "shape": [
389
+ 768
390
+ ],
391
+ "dtype": "float32",
392
+ "format": "f32-to-bf16",
393
+ "nbytes": 1536,
394
+ "byteOffset": 19703816
395
+ },
396
+ {
397
+ "name": "roberta.encoder.layer.1.attention.self.query.weight",
398
+ "shape": [
399
+ 768,
400
+ 768
401
+ ],
402
+ "dtype": "float32",
403
+ "format": "f32-to-bf16",
404
+ "nbytes": 1179648,
405
+ "byteOffset": 19705352
406
+ },
407
+ {
408
+ "name": "roberta.encoder.layer.1.attention.self.value.bias",
409
+ "shape": [
410
+ 768
411
+ ],
412
+ "dtype": "float32",
413
+ "format": "f32-to-bf16",
414
+ "nbytes": 1536,
415
+ "byteOffset": 20885000
416
+ },
417
+ {
418
+ "name": "roberta.encoder.layer.1.attention.self.value.weight",
419
+ "shape": [
420
+ 768,
421
+ 768
422
+ ],
423
+ "dtype": "float32",
424
+ "format": "f32-to-bf16",
425
+ "nbytes": 1179648,
426
+ "byteOffset": 20886536
427
+ },
428
+ {
429
+ "name": "roberta.encoder.layer.1.intermediate.dense.bias",
430
+ "shape": [
431
+ 3072
432
+ ],
433
+ "dtype": "float32",
434
+ "format": "f32-to-bf16",
435
+ "nbytes": 6144,
436
+ "byteOffset": 22066184
437
+ },
438
+ {
439
+ "name": "roberta.encoder.layer.1.intermediate.dense.weight",
440
+ "shape": [
441
+ 3072,
442
+ 768
443
+ ],
444
+ "dtype": "float32",
445
+ "format": "f32-to-bf16",
446
+ "nbytes": 4718592,
447
+ "byteOffset": 22072328
448
+ },
449
+ {
450
+ "name": "roberta.encoder.layer.1.output.LayerNorm.bias",
451
+ "shape": [
452
+ 768
453
+ ],
454
+ "dtype": "float32",
455
+ "format": "f32-to-bf16",
456
+ "nbytes": 1536,
457
+ "byteOffset": 26790920
458
+ },
459
+ {
460
+ "name": "roberta.encoder.layer.1.output.LayerNorm.weight",
461
+ "shape": [
462
+ 768
463
+ ],
464
+ "dtype": "float32",
465
+ "format": "f32-to-bf16",
466
+ "nbytes": 1536,
467
+ "byteOffset": 26792456
468
+ },
469
+ {
470
+ "name": "roberta.encoder.layer.1.output.dense.bias",
471
+ "shape": [
472
+ 768
473
+ ],
474
+ "dtype": "float32",
475
+ "format": "f32-to-bf16",
476
+ "nbytes": 1536,
477
+ "byteOffset": 26793992
478
+ },
479
+ {
480
+ "name": "roberta.encoder.layer.1.output.dense.weight",
481
+ "shape": [
482
+ 768,
483
+ 3072
484
+ ],
485
+ "dtype": "float32",
486
+ "format": "f32-to-bf16",
487
+ "nbytes": 4718592,
488
+ "byteOffset": 26795528
489
+ },
490
+ {
491
+ "name": "roberta.encoder.layer.10.attention.output.LayerNorm.bias",
492
+ "shape": [
493
+ 768
494
+ ],
495
+ "dtype": "float32",
496
+ "format": "f32-to-bf16",
497
+ "nbytes": 1536,
498
+ "byteOffset": 31514120
499
+ },
500
+ {
501
+ "name": "roberta.encoder.layer.10.attention.output.LayerNorm.weight",
502
+ "shape": [
503
+ 768
504
+ ],
505
+ "dtype": "float32",
506
+ "format": "f32-to-bf16",
507
+ "nbytes": 1536,
508
+ "byteOffset": 31515656
509
+ },
510
+ {
511
+ "name": "roberta.encoder.layer.10.attention.output.dense.bias",
512
+ "shape": [
513
+ 768
514
+ ],
515
+ "dtype": "float32",
516
+ "format": "f32-to-bf16",
517
+ "nbytes": 1536,
518
+ "byteOffset": 31517192
519
+ },
520
+ {
521
+ "name": "roberta.encoder.layer.10.attention.output.dense.weight",
522
+ "shape": [
523
+ 768,
524
+ 768
525
+ ],
526
+ "dtype": "float32",
527
+ "format": "f32-to-bf16",
528
+ "nbytes": 1179648,
529
+ "byteOffset": 31518728
530
+ },
531
+ {
532
+ "name": "roberta.encoder.layer.10.attention.self.key.bias",
533
+ "shape": [
534
+ 768
535
+ ],
536
+ "dtype": "float32",
537
+ "format": "f32-to-bf16",
538
+ "nbytes": 1536,
539
+ "byteOffset": 32698376
540
+ }
541
+ ],
542
+ "md5sum": "c773b3066463e8b7ca2c687935fdd38d"
543
+ },
544
+ {
545
+ "dataPath": "params_shard_2.bin",
546
+ "format": "raw-shard",
547
+ "nbytes": 31899648,
548
+ "records": [
549
+ {
550
+ "name": "roberta.encoder.layer.10.attention.self.key.weight",
551
+ "shape": [
552
+ 768,
553
+ 768
554
+ ],
555
+ "dtype": "float32",
556
+ "format": "f32-to-bf16",
557
+ "nbytes": 1179648,
558
+ "byteOffset": 0
559
+ },
560
+ {
561
+ "name": "roberta.encoder.layer.10.attention.self.query.bias",
562
+ "shape": [
563
+ 768
564
+ ],
565
+ "dtype": "float32",
566
+ "format": "f32-to-bf16",
567
+ "nbytes": 1536,
568
+ "byteOffset": 1179648
569
+ },
570
+ {
571
+ "name": "roberta.encoder.layer.10.attention.self.query.weight",
572
+ "shape": [
573
+ 768,
574
+ 768
575
+ ],
576
+ "dtype": "float32",
577
+ "format": "f32-to-bf16",
578
+ "nbytes": 1179648,
579
+ "byteOffset": 1181184
580
+ },
581
+ {
582
+ "name": "roberta.encoder.layer.10.attention.self.value.bias",
583
+ "shape": [
584
+ 768
585
+ ],
586
+ "dtype": "float32",
587
+ "format": "f32-to-bf16",
588
+ "nbytes": 1536,
589
+ "byteOffset": 2360832
590
+ },
591
+ {
592
+ "name": "roberta.encoder.layer.10.attention.self.value.weight",
593
+ "shape": [
594
+ 768,
595
+ 768
596
+ ],
597
+ "dtype": "float32",
598
+ "format": "f32-to-bf16",
599
+ "nbytes": 1179648,
600
+ "byteOffset": 2362368
601
+ },
602
+ {
603
+ "name": "roberta.encoder.layer.10.intermediate.dense.bias",
604
+ "shape": [
605
+ 3072
606
+ ],
607
+ "dtype": "float32",
608
+ "format": "f32-to-bf16",
609
+ "nbytes": 6144,
610
+ "byteOffset": 3542016
611
+ },
612
+ {
613
+ "name": "roberta.encoder.layer.10.intermediate.dense.weight",
614
+ "shape": [
615
+ 3072,
616
+ 768
617
+ ],
618
+ "dtype": "float32",
619
+ "format": "f32-to-bf16",
620
+ "nbytes": 4718592,
621
+ "byteOffset": 3548160
622
+ },
623
+ {
624
+ "name": "roberta.encoder.layer.10.output.LayerNorm.bias",
625
+ "shape": [
626
+ 768
627
+ ],
628
+ "dtype": "float32",
629
+ "format": "f32-to-bf16",
630
+ "nbytes": 1536,
631
+ "byteOffset": 8266752
632
+ },
633
+ {
634
+ "name": "roberta.encoder.layer.10.output.LayerNorm.weight",
635
+ "shape": [
636
+ 768
637
+ ],
638
+ "dtype": "float32",
639
+ "format": "f32-to-bf16",
640
+ "nbytes": 1536,
641
+ "byteOffset": 8268288
642
+ },
643
+ {
644
+ "name": "roberta.encoder.layer.10.output.dense.bias",
645
+ "shape": [
646
+ 768
647
+ ],
648
+ "dtype": "float32",
649
+ "format": "f32-to-bf16",
650
+ "nbytes": 1536,
651
+ "byteOffset": 8269824
652
+ },
653
+ {
654
+ "name": "roberta.encoder.layer.10.output.dense.weight",
655
+ "shape": [
656
+ 768,
657
+ 3072
658
+ ],
659
+ "dtype": "float32",
660
+ "format": "f32-to-bf16",
661
+ "nbytes": 4718592,
662
+ "byteOffset": 8271360
663
+ },
664
+ {
665
+ "name": "roberta.encoder.layer.11.attention.output.LayerNorm.bias",
666
+ "shape": [
667
+ 768
668
+ ],
669
+ "dtype": "float32",
670
+ "format": "f32-to-bf16",
671
+ "nbytes": 1536,
672
+ "byteOffset": 12989952
673
+ },
674
+ {
675
+ "name": "roberta.encoder.layer.11.attention.output.LayerNorm.weight",
676
+ "shape": [
677
+ 768
678
+ ],
679
+ "dtype": "float32",
680
+ "format": "f32-to-bf16",
681
+ "nbytes": 1536,
682
+ "byteOffset": 12991488
683
+ },
684
+ {
685
+ "name": "roberta.encoder.layer.11.attention.output.dense.bias",
686
+ "shape": [
687
+ 768
688
+ ],
689
+ "dtype": "float32",
690
+ "format": "f32-to-bf16",
691
+ "nbytes": 1536,
692
+ "byteOffset": 12993024
693
+ },
694
+ {
695
+ "name": "roberta.encoder.layer.11.attention.output.dense.weight",
696
+ "shape": [
697
+ 768,
698
+ 768
699
+ ],
700
+ "dtype": "float32",
701
+ "format": "f32-to-bf16",
702
+ "nbytes": 1179648,
703
+ "byteOffset": 12994560
704
+ },
705
+ {
706
+ "name": "roberta.encoder.layer.11.attention.self.key.bias",
707
+ "shape": [
708
+ 768
709
+ ],
710
+ "dtype": "float32",
711
+ "format": "f32-to-bf16",
712
+ "nbytes": 1536,
713
+ "byteOffset": 14174208
714
+ },
715
+ {
716
+ "name": "roberta.encoder.layer.11.attention.self.key.weight",
717
+ "shape": [
718
+ 768,
719
+ 768
720
+ ],
721
+ "dtype": "float32",
722
+ "format": "f32-to-bf16",
723
+ "nbytes": 1179648,
724
+ "byteOffset": 14175744
725
+ },
726
+ {
727
+ "name": "roberta.encoder.layer.11.attention.self.query.bias",
728
+ "shape": [
729
+ 768
730
+ ],
731
+ "dtype": "float32",
732
+ "format": "f32-to-bf16",
733
+ "nbytes": 1536,
734
+ "byteOffset": 15355392
735
+ },
736
+ {
737
+ "name": "roberta.encoder.layer.11.attention.self.query.weight",
738
+ "shape": [
739
+ 768,
740
+ 768
741
+ ],
742
+ "dtype": "float32",
743
+ "format": "f32-to-bf16",
744
+ "nbytes": 1179648,
745
+ "byteOffset": 15356928
746
+ },
747
+ {
748
+ "name": "roberta.encoder.layer.11.attention.self.value.bias",
749
+ "shape": [
750
+ 768
751
+ ],
752
+ "dtype": "float32",
753
+ "format": "f32-to-bf16",
754
+ "nbytes": 1536,
755
+ "byteOffset": 16536576
756
+ },
757
+ {
758
+ "name": "roberta.encoder.layer.11.attention.self.value.weight",
759
+ "shape": [
760
+ 768,
761
+ 768
762
+ ],
763
+ "dtype": "float32",
764
+ "format": "f32-to-bf16",
765
+ "nbytes": 1179648,
766
+ "byteOffset": 16538112
767
+ },
768
+ {
769
+ "name": "roberta.encoder.layer.11.intermediate.dense.bias",
770
+ "shape": [
771
+ 3072
772
+ ],
773
+ "dtype": "float32",
774
+ "format": "f32-to-bf16",
775
+ "nbytes": 6144,
776
+ "byteOffset": 17717760
777
+ },
778
+ {
779
+ "name": "roberta.encoder.layer.11.intermediate.dense.weight",
780
+ "shape": [
781
+ 3072,
782
+ 768
783
+ ],
784
+ "dtype": "float32",
785
+ "format": "f32-to-bf16",
786
+ "nbytes": 4718592,
787
+ "byteOffset": 17723904
788
+ },
789
+ {
790
+ "name": "roberta.encoder.layer.11.output.LayerNorm.bias",
791
+ "shape": [
792
+ 768
793
+ ],
794
+ "dtype": "float32",
795
+ "format": "f32-to-bf16",
796
+ "nbytes": 1536,
797
+ "byteOffset": 22442496
798
+ },
799
+ {
800
+ "name": "roberta.encoder.layer.11.output.LayerNorm.weight",
801
+ "shape": [
802
+ 768
803
+ ],
804
+ "dtype": "float32",
805
+ "format": "f32-to-bf16",
806
+ "nbytes": 1536,
807
+ "byteOffset": 22444032
808
+ },
809
+ {
810
+ "name": "roberta.encoder.layer.11.output.dense.bias",
811
+ "shape": [
812
+ 768
813
+ ],
814
+ "dtype": "float32",
815
+ "format": "f32-to-bf16",
816
+ "nbytes": 1536,
817
+ "byteOffset": 22445568
818
+ },
819
+ {
820
+ "name": "roberta.encoder.layer.11.output.dense.weight",
821
+ "shape": [
822
+ 768,
823
+ 3072
824
+ ],
825
+ "dtype": "float32",
826
+ "format": "f32-to-bf16",
827
+ "nbytes": 4718592,
828
+ "byteOffset": 22447104
829
+ },
830
+ {
831
+ "name": "roberta.encoder.layer.2.attention.output.LayerNorm.bias",
832
+ "shape": [
833
+ 768
834
+ ],
835
+ "dtype": "float32",
836
+ "format": "f32-to-bf16",
837
+ "nbytes": 1536,
838
+ "byteOffset": 27165696
839
+ },
840
+ {
841
+ "name": "roberta.encoder.layer.2.attention.output.LayerNorm.weight",
842
+ "shape": [
843
+ 768
844
+ ],
845
+ "dtype": "float32",
846
+ "format": "f32-to-bf16",
847
+ "nbytes": 1536,
848
+ "byteOffset": 27167232
849
+ },
850
+ {
851
+ "name": "roberta.encoder.layer.2.attention.output.dense.bias",
852
+ "shape": [
853
+ 768
854
+ ],
855
+ "dtype": "float32",
856
+ "format": "f32-to-bf16",
857
+ "nbytes": 1536,
858
+ "byteOffset": 27168768
859
+ },
860
+ {
861
+ "name": "roberta.encoder.layer.2.attention.output.dense.weight",
862
+ "shape": [
863
+ 768,
864
+ 768
865
+ ],
866
+ "dtype": "float32",
867
+ "format": "f32-to-bf16",
868
+ "nbytes": 1179648,
869
+ "byteOffset": 27170304
870
+ },
871
+ {
872
+ "name": "roberta.encoder.layer.2.attention.self.key.bias",
873
+ "shape": [
874
+ 768
875
+ ],
876
+ "dtype": "float32",
877
+ "format": "f32-to-bf16",
878
+ "nbytes": 1536,
879
+ "byteOffset": 28349952
880
+ },
881
+ {
882
+ "name": "roberta.encoder.layer.2.attention.self.key.weight",
883
+ "shape": [
884
+ 768,
885
+ 768
886
+ ],
887
+ "dtype": "float32",
888
+ "format": "f32-to-bf16",
889
+ "nbytes": 1179648,
890
+ "byteOffset": 28351488
891
+ },
892
+ {
893
+ "name": "roberta.encoder.layer.2.attention.self.query.bias",
894
+ "shape": [
895
+ 768
896
+ ],
897
+ "dtype": "float32",
898
+ "format": "f32-to-bf16",
899
+ "nbytes": 1536,
900
+ "byteOffset": 29531136
901
+ },
902
+ {
903
+ "name": "roberta.encoder.layer.2.attention.self.query.weight",
904
+ "shape": [
905
+ 768,
906
+ 768
907
+ ],
908
+ "dtype": "float32",
909
+ "format": "f32-to-bf16",
910
+ "nbytes": 1179648,
911
+ "byteOffset": 29532672
912
+ },
913
+ {
914
+ "name": "roberta.encoder.layer.2.attention.self.value.bias",
915
+ "shape": [
916
+ 768
917
+ ],
918
+ "dtype": "float32",
919
+ "format": "f32-to-bf16",
920
+ "nbytes": 1536,
921
+ "byteOffset": 30712320
922
+ },
923
+ {
924
+ "name": "roberta.encoder.layer.2.attention.self.value.weight",
925
+ "shape": [
926
+ 768,
927
+ 768
928
+ ],
929
+ "dtype": "float32",
930
+ "format": "f32-to-bf16",
931
+ "nbytes": 1179648,
932
+ "byteOffset": 30713856
933
+ },
934
+ {
935
+ "name": "roberta.encoder.layer.2.intermediate.dense.bias",
936
+ "shape": [
937
+ 3072
938
+ ],
939
+ "dtype": "float32",
940
+ "format": "f32-to-bf16",
941
+ "nbytes": 6144,
942
+ "byteOffset": 31893504
943
+ }
944
+ ],
945
+ "md5sum": "cd967b7aa914de66a898f89fcac8f9dc"
946
+ },
947
+ {
948
+ "dataPath": "params_shard_3.bin",
949
+ "format": "raw-shard",
950
+ "nbytes": 33074688,
951
+ "records": [
952
+ {
953
+ "name": "roberta.encoder.layer.2.intermediate.dense.weight",
954
+ "shape": [
955
+ 3072,
956
+ 768
957
+ ],
958
+ "dtype": "float32",
959
+ "format": "f32-to-bf16",
960
+ "nbytes": 4718592,
961
+ "byteOffset": 0
962
+ },
963
+ {
964
+ "name": "roberta.encoder.layer.2.output.LayerNorm.bias",
965
+ "shape": [
966
+ 768
967
+ ],
968
+ "dtype": "float32",
969
+ "format": "f32-to-bf16",
970
+ "nbytes": 1536,
971
+ "byteOffset": 4718592
972
+ },
973
+ {
974
+ "name": "roberta.encoder.layer.2.output.LayerNorm.weight",
975
+ "shape": [
976
+ 768
977
+ ],
978
+ "dtype": "float32",
979
+ "format": "f32-to-bf16",
980
+ "nbytes": 1536,
981
+ "byteOffset": 4720128
982
+ },
983
+ {
984
+ "name": "roberta.encoder.layer.2.output.dense.bias",
985
+ "shape": [
986
+ 768
987
+ ],
988
+ "dtype": "float32",
989
+ "format": "f32-to-bf16",
990
+ "nbytes": 1536,
991
+ "byteOffset": 4721664
992
+ },
993
+ {
994
+ "name": "roberta.encoder.layer.2.output.dense.weight",
995
+ "shape": [
996
+ 768,
997
+ 3072
998
+ ],
999
+ "dtype": "float32",
1000
+ "format": "f32-to-bf16",
1001
+ "nbytes": 4718592,
1002
+ "byteOffset": 4723200
1003
+ },
1004
+ {
1005
+ "name": "roberta.encoder.layer.3.attention.output.LayerNorm.bias",
1006
+ "shape": [
1007
+ 768
1008
+ ],
1009
+ "dtype": "float32",
1010
+ "format": "f32-to-bf16",
1011
+ "nbytes": 1536,
1012
+ "byteOffset": 9441792
1013
+ },
1014
+ {
1015
+ "name": "roberta.encoder.layer.3.attention.output.LayerNorm.weight",
1016
+ "shape": [
1017
+ 768
1018
+ ],
1019
+ "dtype": "float32",
1020
+ "format": "f32-to-bf16",
1021
+ "nbytes": 1536,
1022
+ "byteOffset": 9443328
1023
+ },
1024
+ {
1025
+ "name": "roberta.encoder.layer.3.attention.output.dense.bias",
1026
+ "shape": [
1027
+ 768
1028
+ ],
1029
+ "dtype": "float32",
1030
+ "format": "f32-to-bf16",
1031
+ "nbytes": 1536,
1032
+ "byteOffset": 9444864
1033
+ },
1034
+ {
1035
+ "name": "roberta.encoder.layer.3.attention.output.dense.weight",
1036
+ "shape": [
1037
+ 768,
1038
+ 768
1039
+ ],
1040
+ "dtype": "float32",
1041
+ "format": "f32-to-bf16",
1042
+ "nbytes": 1179648,
1043
+ "byteOffset": 9446400
1044
+ },
1045
+ {
1046
+ "name": "roberta.encoder.layer.3.attention.self.key.bias",
1047
+ "shape": [
1048
+ 768
1049
+ ],
1050
+ "dtype": "float32",
1051
+ "format": "f32-to-bf16",
1052
+ "nbytes": 1536,
1053
+ "byteOffset": 10626048
1054
+ },
1055
+ {
1056
+ "name": "roberta.encoder.layer.3.attention.self.key.weight",
1057
+ "shape": [
1058
+ 768,
1059
+ 768
1060
+ ],
1061
+ "dtype": "float32",
1062
+ "format": "f32-to-bf16",
1063
+ "nbytes": 1179648,
1064
+ "byteOffset": 10627584
1065
+ },
1066
+ {
1067
+ "name": "roberta.encoder.layer.3.attention.self.query.bias",
1068
+ "shape": [
1069
+ 768
1070
+ ],
1071
+ "dtype": "float32",
1072
+ "format": "f32-to-bf16",
1073
+ "nbytes": 1536,
1074
+ "byteOffset": 11807232
1075
+ },
1076
+ {
1077
+ "name": "roberta.encoder.layer.3.attention.self.query.weight",
1078
+ "shape": [
1079
+ 768,
1080
+ 768
1081
+ ],
1082
+ "dtype": "float32",
1083
+ "format": "f32-to-bf16",
1084
+ "nbytes": 1179648,
1085
+ "byteOffset": 11808768
1086
+ },
1087
+ {
1088
+ "name": "roberta.encoder.layer.3.attention.self.value.bias",
1089
+ "shape": [
1090
+ 768
1091
+ ],
1092
+ "dtype": "float32",
1093
+ "format": "f32-to-bf16",
1094
+ "nbytes": 1536,
1095
+ "byteOffset": 12988416
1096
+ },
1097
+ {
1098
+ "name": "roberta.encoder.layer.3.attention.self.value.weight",
1099
+ "shape": [
1100
+ 768,
1101
+ 768
1102
+ ],
1103
+ "dtype": "float32",
1104
+ "format": "f32-to-bf16",
1105
+ "nbytes": 1179648,
1106
+ "byteOffset": 12989952
1107
+ },
1108
+ {
1109
+ "name": "roberta.encoder.layer.3.intermediate.dense.bias",
1110
+ "shape": [
1111
+ 3072
1112
+ ],
1113
+ "dtype": "float32",
1114
+ "format": "f32-to-bf16",
1115
+ "nbytes": 6144,
1116
+ "byteOffset": 14169600
1117
+ },
1118
+ {
1119
+ "name": "roberta.encoder.layer.3.intermediate.dense.weight",
1120
+ "shape": [
1121
+ 3072,
1122
+ 768
1123
+ ],
1124
+ "dtype": "float32",
1125
+ "format": "f32-to-bf16",
1126
+ "nbytes": 4718592,
1127
+ "byteOffset": 14175744
1128
+ },
1129
+ {
1130
+ "name": "roberta.encoder.layer.3.output.LayerNorm.bias",
1131
+ "shape": [
1132
+ 768
1133
+ ],
1134
+ "dtype": "float32",
1135
+ "format": "f32-to-bf16",
1136
+ "nbytes": 1536,
1137
+ "byteOffset": 18894336
1138
+ },
1139
+ {
1140
+ "name": "roberta.encoder.layer.3.output.LayerNorm.weight",
1141
+ "shape": [
1142
+ 768
1143
+ ],
1144
+ "dtype": "float32",
1145
+ "format": "f32-to-bf16",
1146
+ "nbytes": 1536,
1147
+ "byteOffset": 18895872
1148
+ },
1149
+ {
1150
+ "name": "roberta.encoder.layer.3.output.dense.bias",
1151
+ "shape": [
1152
+ 768
1153
+ ],
1154
+ "dtype": "float32",
1155
+ "format": "f32-to-bf16",
1156
+ "nbytes": 1536,
1157
+ "byteOffset": 18897408
1158
+ },
1159
+ {
1160
+ "name": "roberta.encoder.layer.3.output.dense.weight",
1161
+ "shape": [
1162
+ 768,
1163
+ 3072
1164
+ ],
1165
+ "dtype": "float32",
1166
+ "format": "f32-to-bf16",
1167
+ "nbytes": 4718592,
1168
+ "byteOffset": 18898944
1169
+ },
1170
+ {
1171
+ "name": "roberta.encoder.layer.4.attention.output.LayerNorm.bias",
1172
+ "shape": [
1173
+ 768
1174
+ ],
1175
+ "dtype": "float32",
1176
+ "format": "f32-to-bf16",
1177
+ "nbytes": 1536,
1178
+ "byteOffset": 23617536
1179
+ },
1180
+ {
1181
+ "name": "roberta.encoder.layer.4.attention.output.LayerNorm.weight",
1182
+ "shape": [
1183
+ 768
1184
+ ],
1185
+ "dtype": "float32",
1186
+ "format": "f32-to-bf16",
1187
+ "nbytes": 1536,
1188
+ "byteOffset": 23619072
1189
+ },
1190
+ {
1191
+ "name": "roberta.encoder.layer.4.attention.output.dense.bias",
1192
+ "shape": [
1193
+ 768
1194
+ ],
1195
+ "dtype": "float32",
1196
+ "format": "f32-to-bf16",
1197
+ "nbytes": 1536,
1198
+ "byteOffset": 23620608
1199
+ },
1200
+ {
1201
+ "name": "roberta.encoder.layer.4.attention.output.dense.weight",
1202
+ "shape": [
1203
+ 768,
1204
+ 768
1205
+ ],
1206
+ "dtype": "float32",
1207
+ "format": "f32-to-bf16",
1208
+ "nbytes": 1179648,
1209
+ "byteOffset": 23622144
1210
+ },
1211
+ {
1212
+ "name": "roberta.encoder.layer.4.attention.self.key.bias",
1213
+ "shape": [
1214
+ 768
1215
+ ],
1216
+ "dtype": "float32",
1217
+ "format": "f32-to-bf16",
1218
+ "nbytes": 1536,
1219
+ "byteOffset": 24801792
1220
+ },
1221
+ {
1222
+ "name": "roberta.encoder.layer.4.attention.self.key.weight",
1223
+ "shape": [
1224
+ 768,
1225
+ 768
1226
+ ],
1227
+ "dtype": "float32",
1228
+ "format": "f32-to-bf16",
1229
+ "nbytes": 1179648,
1230
+ "byteOffset": 24803328
1231
+ },
1232
+ {
1233
+ "name": "roberta.encoder.layer.4.attention.self.query.bias",
1234
+ "shape": [
1235
+ 768
1236
+ ],
1237
+ "dtype": "float32",
1238
+ "format": "f32-to-bf16",
1239
+ "nbytes": 1536,
1240
+ "byteOffset": 25982976
1241
+ },
1242
+ {
1243
+ "name": "roberta.encoder.layer.4.attention.self.query.weight",
1244
+ "shape": [
1245
+ 768,
1246
+ 768
1247
+ ],
1248
+ "dtype": "float32",
1249
+ "format": "f32-to-bf16",
1250
+ "nbytes": 1179648,
1251
+ "byteOffset": 25984512
1252
+ },
1253
+ {
1254
+ "name": "roberta.encoder.layer.4.attention.self.value.bias",
1255
+ "shape": [
1256
+ 768
1257
+ ],
1258
+ "dtype": "float32",
1259
+ "format": "f32-to-bf16",
1260
+ "nbytes": 1536,
1261
+ "byteOffset": 27164160
1262
+ },
1263
+ {
1264
+ "name": "roberta.encoder.layer.4.attention.self.value.weight",
1265
+ "shape": [
1266
+ 768,
1267
+ 768
1268
+ ],
1269
+ "dtype": "float32",
1270
+ "format": "f32-to-bf16",
1271
+ "nbytes": 1179648,
1272
+ "byteOffset": 27165696
1273
+ },
1274
+ {
1275
+ "name": "roberta.encoder.layer.4.intermediate.dense.bias",
1276
+ "shape": [
1277
+ 3072
1278
+ ],
1279
+ "dtype": "float32",
1280
+ "format": "f32-to-bf16",
1281
+ "nbytes": 6144,
1282
+ "byteOffset": 28345344
1283
+ },
1284
+ {
1285
+ "name": "roberta.encoder.layer.4.intermediate.dense.weight",
1286
+ "shape": [
1287
+ 3072,
1288
+ 768
1289
+ ],
1290
+ "dtype": "float32",
1291
+ "format": "f32-to-bf16",
1292
+ "nbytes": 4718592,
1293
+ "byteOffset": 28351488
1294
+ },
1295
+ {
1296
+ "name": "roberta.encoder.layer.4.output.LayerNorm.bias",
1297
+ "shape": [
1298
+ 768
1299
+ ],
1300
+ "dtype": "float32",
1301
+ "format": "f32-to-bf16",
1302
+ "nbytes": 1536,
1303
+ "byteOffset": 33070080
1304
+ },
1305
+ {
1306
+ "name": "roberta.encoder.layer.4.output.LayerNorm.weight",
1307
+ "shape": [
1308
+ 768
1309
+ ],
1310
+ "dtype": "float32",
1311
+ "format": "f32-to-bf16",
1312
+ "nbytes": 1536,
1313
+ "byteOffset": 33071616
1314
+ },
1315
+ {
1316
+ "name": "roberta.encoder.layer.4.output.dense.bias",
1317
+ "shape": [
1318
+ 768
1319
+ ],
1320
+ "dtype": "float32",
1321
+ "format": "f32-to-bf16",
1322
+ "nbytes": 1536,
1323
+ "byteOffset": 33073152
1324
+ }
1325
+ ],
1326
+ "md5sum": "f1598d0a023d40021c0400ee9f49844d"
1327
+ },
1328
+ {
1329
+ "dataPath": "params_shard_4.bin",
1330
+ "format": "raw-shard",
1331
+ "nbytes": 33074688,
1332
+ "records": [
1333
+ {
1334
+ "name": "roberta.encoder.layer.4.output.dense.weight",
1335
+ "shape": [
1336
+ 768,
1337
+ 3072
1338
+ ],
1339
+ "dtype": "float32",
1340
+ "format": "f32-to-bf16",
1341
+ "nbytes": 4718592,
1342
+ "byteOffset": 0
1343
+ },
1344
+ {
1345
+ "name": "roberta.encoder.layer.5.attention.output.LayerNorm.bias",
1346
+ "shape": [
1347
+ 768
1348
+ ],
1349
+ "dtype": "float32",
1350
+ "format": "f32-to-bf16",
1351
+ "nbytes": 1536,
1352
+ "byteOffset": 4718592
1353
+ },
1354
+ {
1355
+ "name": "roberta.encoder.layer.5.attention.output.LayerNorm.weight",
1356
+ "shape": [
1357
+ 768
1358
+ ],
1359
+ "dtype": "float32",
1360
+ "format": "f32-to-bf16",
1361
+ "nbytes": 1536,
1362
+ "byteOffset": 4720128
1363
+ },
1364
+ {
1365
+ "name": "roberta.encoder.layer.5.attention.output.dense.bias",
1366
+ "shape": [
1367
+ 768
1368
+ ],
1369
+ "dtype": "float32",
1370
+ "format": "f32-to-bf16",
1371
+ "nbytes": 1536,
1372
+ "byteOffset": 4721664
1373
+ },
1374
+ {
1375
+ "name": "roberta.encoder.layer.5.attention.output.dense.weight",
1376
+ "shape": [
1377
+ 768,
1378
+ 768
1379
+ ],
1380
+ "dtype": "float32",
1381
+ "format": "f32-to-bf16",
1382
+ "nbytes": 1179648,
1383
+ "byteOffset": 4723200
1384
+ },
1385
+ {
1386
+ "name": "roberta.encoder.layer.5.attention.self.key.bias",
1387
+ "shape": [
1388
+ 768
1389
+ ],
1390
+ "dtype": "float32",
1391
+ "format": "f32-to-bf16",
1392
+ "nbytes": 1536,
1393
+ "byteOffset": 5902848
1394
+ },
1395
+ {
1396
+ "name": "roberta.encoder.layer.5.attention.self.key.weight",
1397
+ "shape": [
1398
+ 768,
1399
+ 768
1400
+ ],
1401
+ "dtype": "float32",
1402
+ "format": "f32-to-bf16",
1403
+ "nbytes": 1179648,
1404
+ "byteOffset": 5904384
1405
+ },
1406
+ {
1407
+ "name": "roberta.encoder.layer.5.attention.self.query.bias",
1408
+ "shape": [
1409
+ 768
1410
+ ],
1411
+ "dtype": "float32",
1412
+ "format": "f32-to-bf16",
1413
+ "nbytes": 1536,
1414
+ "byteOffset": 7084032
1415
+ },
1416
+ {
1417
+ "name": "roberta.encoder.layer.5.attention.self.query.weight",
1418
+ "shape": [
1419
+ 768,
1420
+ 768
1421
+ ],
1422
+ "dtype": "float32",
1423
+ "format": "f32-to-bf16",
1424
+ "nbytes": 1179648,
1425
+ "byteOffset": 7085568
1426
+ },
1427
+ {
1428
+ "name": "roberta.encoder.layer.5.attention.self.value.bias",
1429
+ "shape": [
1430
+ 768
1431
+ ],
1432
+ "dtype": "float32",
1433
+ "format": "f32-to-bf16",
1434
+ "nbytes": 1536,
1435
+ "byteOffset": 8265216
1436
+ },
1437
+ {
1438
+ "name": "roberta.encoder.layer.5.attention.self.value.weight",
1439
+ "shape": [
1440
+ 768,
1441
+ 768
1442
+ ],
1443
+ "dtype": "float32",
1444
+ "format": "f32-to-bf16",
1445
+ "nbytes": 1179648,
1446
+ "byteOffset": 8266752
1447
+ },
1448
+ {
1449
+ "name": "roberta.encoder.layer.5.intermediate.dense.bias",
1450
+ "shape": [
1451
+ 3072
1452
+ ],
1453
+ "dtype": "float32",
1454
+ "format": "f32-to-bf16",
1455
+ "nbytes": 6144,
1456
+ "byteOffset": 9446400
1457
+ },
1458
+ {
1459
+ "name": "roberta.encoder.layer.5.intermediate.dense.weight",
1460
+ "shape": [
1461
+ 3072,
1462
+ 768
1463
+ ],
1464
+ "dtype": "float32",
1465
+ "format": "f32-to-bf16",
1466
+ "nbytes": 4718592,
1467
+ "byteOffset": 9452544
1468
+ },
1469
+ {
1470
+ "name": "roberta.encoder.layer.5.output.LayerNorm.bias",
1471
+ "shape": [
1472
+ 768
1473
+ ],
1474
+ "dtype": "float32",
1475
+ "format": "f32-to-bf16",
1476
+ "nbytes": 1536,
1477
+ "byteOffset": 14171136
1478
+ },
1479
+ {
1480
+ "name": "roberta.encoder.layer.5.output.LayerNorm.weight",
1481
+ "shape": [
1482
+ 768
1483
+ ],
1484
+ "dtype": "float32",
1485
+ "format": "f32-to-bf16",
1486
+ "nbytes": 1536,
1487
+ "byteOffset": 14172672
1488
+ },
1489
+ {
1490
+ "name": "roberta.encoder.layer.5.output.dense.bias",
1491
+ "shape": [
1492
+ 768
1493
+ ],
1494
+ "dtype": "float32",
1495
+ "format": "f32-to-bf16",
1496
+ "nbytes": 1536,
1497
+ "byteOffset": 14174208
1498
+ },
1499
+ {
1500
+ "name": "roberta.encoder.layer.5.output.dense.weight",
1501
+ "shape": [
1502
+ 768,
1503
+ 3072
1504
+ ],
1505
+ "dtype": "float32",
1506
+ "format": "f32-to-bf16",
1507
+ "nbytes": 4718592,
1508
+ "byteOffset": 14175744
1509
+ },
1510
+ {
1511
+ "name": "roberta.encoder.layer.6.attention.output.LayerNorm.bias",
1512
+ "shape": [
1513
+ 768
1514
+ ],
1515
+ "dtype": "float32",
1516
+ "format": "f32-to-bf16",
1517
+ "nbytes": 1536,
1518
+ "byteOffset": 18894336
1519
+ },
1520
+ {
1521
+ "name": "roberta.encoder.layer.6.attention.output.LayerNorm.weight",
1522
+ "shape": [
1523
+ 768
1524
+ ],
1525
+ "dtype": "float32",
1526
+ "format": "f32-to-bf16",
1527
+ "nbytes": 1536,
1528
+ "byteOffset": 18895872
1529
+ },
1530
+ {
1531
+ "name": "roberta.encoder.layer.6.attention.output.dense.bias",
1532
+ "shape": [
1533
+ 768
1534
+ ],
1535
+ "dtype": "float32",
1536
+ "format": "f32-to-bf16",
1537
+ "nbytes": 1536,
1538
+ "byteOffset": 18897408
1539
+ },
1540
+ {
1541
+ "name": "roberta.encoder.layer.6.attention.output.dense.weight",
1542
+ "shape": [
1543
+ 768,
1544
+ 768
1545
+ ],
1546
+ "dtype": "float32",
1547
+ "format": "f32-to-bf16",
1548
+ "nbytes": 1179648,
1549
+ "byteOffset": 18898944
1550
+ },
1551
+ {
1552
+ "name": "roberta.encoder.layer.6.attention.self.key.bias",
1553
+ "shape": [
1554
+ 768
1555
+ ],
1556
+ "dtype": "float32",
1557
+ "format": "f32-to-bf16",
1558
+ "nbytes": 1536,
1559
+ "byteOffset": 20078592
1560
+ },
1561
+ {
1562
+ "name": "roberta.encoder.layer.6.attention.self.key.weight",
1563
+ "shape": [
1564
+ 768,
1565
+ 768
1566
+ ],
1567
+ "dtype": "float32",
1568
+ "format": "f32-to-bf16",
1569
+ "nbytes": 1179648,
1570
+ "byteOffset": 20080128
1571
+ },
1572
+ {
1573
+ "name": "roberta.encoder.layer.6.attention.self.query.bias",
1574
+ "shape": [
1575
+ 768
1576
+ ],
1577
+ "dtype": "float32",
1578
+ "format": "f32-to-bf16",
1579
+ "nbytes": 1536,
1580
+ "byteOffset": 21259776
1581
+ },
1582
+ {
1583
+ "name": "roberta.encoder.layer.6.attention.self.query.weight",
1584
+ "shape": [
1585
+ 768,
1586
+ 768
1587
+ ],
1588
+ "dtype": "float32",
1589
+ "format": "f32-to-bf16",
1590
+ "nbytes": 1179648,
1591
+ "byteOffset": 21261312
1592
+ },
1593
+ {
1594
+ "name": "roberta.encoder.layer.6.attention.self.value.bias",
1595
+ "shape": [
1596
+ 768
1597
+ ],
1598
+ "dtype": "float32",
1599
+ "format": "f32-to-bf16",
1600
+ "nbytes": 1536,
1601
+ "byteOffset": 22440960
1602
+ },
1603
+ {
1604
+ "name": "roberta.encoder.layer.6.attention.self.value.weight",
1605
+ "shape": [
1606
+ 768,
1607
+ 768
1608
+ ],
1609
+ "dtype": "float32",
1610
+ "format": "f32-to-bf16",
1611
+ "nbytes": 1179648,
1612
+ "byteOffset": 22442496
1613
+ },
1614
+ {
1615
+ "name": "roberta.encoder.layer.6.intermediate.dense.bias",
1616
+ "shape": [
1617
+ 3072
1618
+ ],
1619
+ "dtype": "float32",
1620
+ "format": "f32-to-bf16",
1621
+ "nbytes": 6144,
1622
+ "byteOffset": 23622144
1623
+ },
1624
+ {
1625
+ "name": "roberta.encoder.layer.6.intermediate.dense.weight",
1626
+ "shape": [
1627
+ 3072,
1628
+ 768
1629
+ ],
1630
+ "dtype": "float32",
1631
+ "format": "f32-to-bf16",
1632
+ "nbytes": 4718592,
1633
+ "byteOffset": 23628288
1634
+ },
1635
+ {
1636
+ "name": "roberta.encoder.layer.6.output.LayerNorm.bias",
1637
+ "shape": [
1638
+ 768
1639
+ ],
1640
+ "dtype": "float32",
1641
+ "format": "f32-to-bf16",
1642
+ "nbytes": 1536,
1643
+ "byteOffset": 28346880
1644
+ },
1645
+ {
1646
+ "name": "roberta.encoder.layer.6.output.LayerNorm.weight",
1647
+ "shape": [
1648
+ 768
1649
+ ],
1650
+ "dtype": "float32",
1651
+ "format": "f32-to-bf16",
1652
+ "nbytes": 1536,
1653
+ "byteOffset": 28348416
1654
+ },
1655
+ {
1656
+ "name": "roberta.encoder.layer.6.output.dense.bias",
1657
+ "shape": [
1658
+ 768
1659
+ ],
1660
+ "dtype": "float32",
1661
+ "format": "f32-to-bf16",
1662
+ "nbytes": 1536,
1663
+ "byteOffset": 28349952
1664
+ },
1665
+ {
1666
+ "name": "roberta.encoder.layer.6.output.dense.weight",
1667
+ "shape": [
1668
+ 768,
1669
+ 3072
1670
+ ],
1671
+ "dtype": "float32",
1672
+ "format": "f32-to-bf16",
1673
+ "nbytes": 4718592,
1674
+ "byteOffset": 28351488
1675
+ },
1676
+ {
1677
+ "name": "roberta.encoder.layer.7.attention.output.LayerNorm.bias",
1678
+ "shape": [
1679
+ 768
1680
+ ],
1681
+ "dtype": "float32",
1682
+ "format": "f32-to-bf16",
1683
+ "nbytes": 1536,
1684
+ "byteOffset": 33070080
1685
+ },
1686
+ {
1687
+ "name": "roberta.encoder.layer.7.attention.output.LayerNorm.weight",
1688
+ "shape": [
1689
+ 768
1690
+ ],
1691
+ "dtype": "float32",
1692
+ "format": "f32-to-bf16",
1693
+ "nbytes": 1536,
1694
+ "byteOffset": 33071616
1695
+ },
1696
+ {
1697
+ "name": "roberta.encoder.layer.7.attention.output.dense.bias",
1698
+ "shape": [
1699
+ 768
1700
+ ],
1701
+ "dtype": "float32",
1702
+ "format": "f32-to-bf16",
1703
+ "nbytes": 1536,
1704
+ "byteOffset": 33073152
1705
+ }
1706
+ ],
1707
+ "md5sum": "d7db2ff4e87fb3b13f5a0f5e9c00d5e9"
1708
+ },
1709
+ {
1710
+ "dataPath": "params_shard_5.bin",
1711
+ "format": "raw-shard",
1712
+ "nbytes": 33080832,
1713
+ "records": [
1714
+ {
1715
+ "name": "roberta.encoder.layer.7.attention.output.dense.weight",
1716
+ "shape": [
1717
+ 768,
1718
+ 768
1719
+ ],
1720
+ "dtype": "float32",
1721
+ "format": "f32-to-bf16",
1722
+ "nbytes": 1179648,
1723
+ "byteOffset": 0
1724
+ },
1725
+ {
1726
+ "name": "roberta.encoder.layer.7.attention.self.key.bias",
1727
+ "shape": [
1728
+ 768
1729
+ ],
1730
+ "dtype": "float32",
1731
+ "format": "f32-to-bf16",
1732
+ "nbytes": 1536,
1733
+ "byteOffset": 1179648
1734
+ },
1735
+ {
1736
+ "name": "roberta.encoder.layer.7.attention.self.key.weight",
1737
+ "shape": [
1738
+ 768,
1739
+ 768
1740
+ ],
1741
+ "dtype": "float32",
1742
+ "format": "f32-to-bf16",
1743
+ "nbytes": 1179648,
1744
+ "byteOffset": 1181184
1745
+ },
1746
+ {
1747
+ "name": "roberta.encoder.layer.7.attention.self.query.bias",
1748
+ "shape": [
1749
+ 768
1750
+ ],
1751
+ "dtype": "float32",
1752
+ "format": "f32-to-bf16",
1753
+ "nbytes": 1536,
1754
+ "byteOffset": 2360832
1755
+ },
1756
+ {
1757
+ "name": "roberta.encoder.layer.7.attention.self.query.weight",
1758
+ "shape": [
1759
+ 768,
1760
+ 768
1761
+ ],
1762
+ "dtype": "float32",
1763
+ "format": "f32-to-bf16",
1764
+ "nbytes": 1179648,
1765
+ "byteOffset": 2362368
1766
+ },
1767
+ {
1768
+ "name": "roberta.encoder.layer.7.attention.self.value.bias",
1769
+ "shape": [
1770
+ 768
1771
+ ],
1772
+ "dtype": "float32",
1773
+ "format": "f32-to-bf16",
1774
+ "nbytes": 1536,
1775
+ "byteOffset": 3542016
1776
+ },
1777
+ {
1778
+ "name": "roberta.encoder.layer.7.attention.self.value.weight",
1779
+ "shape": [
1780
+ 768,
1781
+ 768
1782
+ ],
1783
+ "dtype": "float32",
1784
+ "format": "f32-to-bf16",
1785
+ "nbytes": 1179648,
1786
+ "byteOffset": 3543552
1787
+ },
1788
+ {
1789
+ "name": "roberta.encoder.layer.7.intermediate.dense.bias",
1790
+ "shape": [
1791
+ 3072
1792
+ ],
1793
+ "dtype": "float32",
1794
+ "format": "f32-to-bf16",
1795
+ "nbytes": 6144,
1796
+ "byteOffset": 4723200
1797
+ },
1798
+ {
1799
+ "name": "roberta.encoder.layer.7.intermediate.dense.weight",
1800
+ "shape": [
1801
+ 3072,
1802
+ 768
1803
+ ],
1804
+ "dtype": "float32",
1805
+ "format": "f32-to-bf16",
1806
+ "nbytes": 4718592,
1807
+ "byteOffset": 4729344
1808
+ },
1809
+ {
1810
+ "name": "roberta.encoder.layer.7.output.LayerNorm.bias",
1811
+ "shape": [
1812
+ 768
1813
+ ],
1814
+ "dtype": "float32",
1815
+ "format": "f32-to-bf16",
1816
+ "nbytes": 1536,
1817
+ "byteOffset": 9447936
1818
+ },
1819
+ {
1820
+ "name": "roberta.encoder.layer.7.output.LayerNorm.weight",
1821
+ "shape": [
1822
+ 768
1823
+ ],
1824
+ "dtype": "float32",
1825
+ "format": "f32-to-bf16",
1826
+ "nbytes": 1536,
1827
+ "byteOffset": 9449472
1828
+ },
1829
+ {
1830
+ "name": "roberta.encoder.layer.7.output.dense.bias",
1831
+ "shape": [
1832
+ 768
1833
+ ],
1834
+ "dtype": "float32",
1835
+ "format": "f32-to-bf16",
1836
+ "nbytes": 1536,
1837
+ "byteOffset": 9451008
1838
+ },
1839
+ {
1840
+ "name": "roberta.encoder.layer.7.output.dense.weight",
1841
+ "shape": [
1842
+ 768,
1843
+ 3072
1844
+ ],
1845
+ "dtype": "float32",
1846
+ "format": "f32-to-bf16",
1847
+ "nbytes": 4718592,
1848
+ "byteOffset": 9452544
1849
+ },
1850
+ {
1851
+ "name": "roberta.encoder.layer.8.attention.output.LayerNorm.bias",
1852
+ "shape": [
1853
+ 768
1854
+ ],
1855
+ "dtype": "float32",
1856
+ "format": "f32-to-bf16",
1857
+ "nbytes": 1536,
1858
+ "byteOffset": 14171136
1859
+ },
1860
+ {
1861
+ "name": "roberta.encoder.layer.8.attention.output.LayerNorm.weight",
1862
+ "shape": [
1863
+ 768
1864
+ ],
1865
+ "dtype": "float32",
1866
+ "format": "f32-to-bf16",
1867
+ "nbytes": 1536,
1868
+ "byteOffset": 14172672
1869
+ },
1870
+ {
1871
+ "name": "roberta.encoder.layer.8.attention.output.dense.bias",
1872
+ "shape": [
1873
+ 768
1874
+ ],
1875
+ "dtype": "float32",
1876
+ "format": "f32-to-bf16",
1877
+ "nbytes": 1536,
1878
+ "byteOffset": 14174208
1879
+ },
1880
+ {
1881
+ "name": "roberta.encoder.layer.8.attention.output.dense.weight",
1882
+ "shape": [
1883
+ 768,
1884
+ 768
1885
+ ],
1886
+ "dtype": "float32",
1887
+ "format": "f32-to-bf16",
1888
+ "nbytes": 1179648,
1889
+ "byteOffset": 14175744
1890
+ },
1891
+ {
1892
+ "name": "roberta.encoder.layer.8.attention.self.key.bias",
1893
+ "shape": [
1894
+ 768
1895
+ ],
1896
+ "dtype": "float32",
1897
+ "format": "f32-to-bf16",
1898
+ "nbytes": 1536,
1899
+ "byteOffset": 15355392
1900
+ },
1901
+ {
1902
+ "name": "roberta.encoder.layer.8.attention.self.key.weight",
1903
+ "shape": [
1904
+ 768,
1905
+ 768
1906
+ ],
1907
+ "dtype": "float32",
1908
+ "format": "f32-to-bf16",
1909
+ "nbytes": 1179648,
1910
+ "byteOffset": 15356928
1911
+ },
1912
+ {
1913
+ "name": "roberta.encoder.layer.8.attention.self.query.bias",
1914
+ "shape": [
1915
+ 768
1916
+ ],
1917
+ "dtype": "float32",
1918
+ "format": "f32-to-bf16",
1919
+ "nbytes": 1536,
1920
+ "byteOffset": 16536576
1921
+ },
1922
+ {
1923
+ "name": "roberta.encoder.layer.8.attention.self.query.weight",
1924
+ "shape": [
1925
+ 768,
1926
+ 768
1927
+ ],
1928
+ "dtype": "float32",
1929
+ "format": "f32-to-bf16",
1930
+ "nbytes": 1179648,
1931
+ "byteOffset": 16538112
1932
+ },
1933
+ {
1934
+ "name": "roberta.encoder.layer.8.attention.self.value.bias",
1935
+ "shape": [
1936
+ 768
1937
+ ],
1938
+ "dtype": "float32",
1939
+ "format": "f32-to-bf16",
1940
+ "nbytes": 1536,
1941
+ "byteOffset": 17717760
1942
+ },
1943
+ {
1944
+ "name": "roberta.encoder.layer.8.attention.self.value.weight",
1945
+ "shape": [
1946
+ 768,
1947
+ 768
1948
+ ],
1949
+ "dtype": "float32",
1950
+ "format": "f32-to-bf16",
1951
+ "nbytes": 1179648,
1952
+ "byteOffset": 17719296
1953
+ },
1954
+ {
1955
+ "name": "roberta.encoder.layer.8.intermediate.dense.bias",
1956
+ "shape": [
1957
+ 3072
1958
+ ],
1959
+ "dtype": "float32",
1960
+ "format": "f32-to-bf16",
1961
+ "nbytes": 6144,
1962
+ "byteOffset": 18898944
1963
+ },
1964
+ {
1965
+ "name": "roberta.encoder.layer.8.intermediate.dense.weight",
1966
+ "shape": [
1967
+ 3072,
1968
+ 768
1969
+ ],
1970
+ "dtype": "float32",
1971
+ "format": "f32-to-bf16",
1972
+ "nbytes": 4718592,
1973
+ "byteOffset": 18905088
1974
+ },
1975
+ {
1976
+ "name": "roberta.encoder.layer.8.output.LayerNorm.bias",
1977
+ "shape": [
1978
+ 768
1979
+ ],
1980
+ "dtype": "float32",
1981
+ "format": "f32-to-bf16",
1982
+ "nbytes": 1536,
1983
+ "byteOffset": 23623680
1984
+ },
1985
+ {
1986
+ "name": "roberta.encoder.layer.8.output.LayerNorm.weight",
1987
+ "shape": [
1988
+ 768
1989
+ ],
1990
+ "dtype": "float32",
1991
+ "format": "f32-to-bf16",
1992
+ "nbytes": 1536,
1993
+ "byteOffset": 23625216
1994
+ },
1995
+ {
1996
+ "name": "roberta.encoder.layer.8.output.dense.bias",
1997
+ "shape": [
1998
+ 768
1999
+ ],
2000
+ "dtype": "float32",
2001
+ "format": "f32-to-bf16",
2002
+ "nbytes": 1536,
2003
+ "byteOffset": 23626752
2004
+ },
2005
+ {
2006
+ "name": "roberta.encoder.layer.8.output.dense.weight",
2007
+ "shape": [
2008
+ 768,
2009
+ 3072
2010
+ ],
2011
+ "dtype": "float32",
2012
+ "format": "f32-to-bf16",
2013
+ "nbytes": 4718592,
2014
+ "byteOffset": 23628288
2015
+ },
2016
+ {
2017
+ "name": "roberta.encoder.layer.9.attention.output.LayerNorm.bias",
2018
+ "shape": [
2019
+ 768
2020
+ ],
2021
+ "dtype": "float32",
2022
+ "format": "f32-to-bf16",
2023
+ "nbytes": 1536,
2024
+ "byteOffset": 28346880
2025
+ },
2026
+ {
2027
+ "name": "roberta.encoder.layer.9.attention.output.LayerNorm.weight",
2028
+ "shape": [
2029
+ 768
2030
+ ],
2031
+ "dtype": "float32",
2032
+ "format": "f32-to-bf16",
2033
+ "nbytes": 1536,
2034
+ "byteOffset": 28348416
2035
+ },
2036
+ {
2037
+ "name": "roberta.encoder.layer.9.attention.output.dense.bias",
2038
+ "shape": [
2039
+ 768
2040
+ ],
2041
+ "dtype": "float32",
2042
+ "format": "f32-to-bf16",
2043
+ "nbytes": 1536,
2044
+ "byteOffset": 28349952
2045
+ },
2046
+ {
2047
+ "name": "roberta.encoder.layer.9.attention.output.dense.weight",
2048
+ "shape": [
2049
+ 768,
2050
+ 768
2051
+ ],
2052
+ "dtype": "float32",
2053
+ "format": "f32-to-bf16",
2054
+ "nbytes": 1179648,
2055
+ "byteOffset": 28351488
2056
+ },
2057
+ {
2058
+ "name": "roberta.encoder.layer.9.attention.self.key.bias",
2059
+ "shape": [
2060
+ 768
2061
+ ],
2062
+ "dtype": "float32",
2063
+ "format": "f32-to-bf16",
2064
+ "nbytes": 1536,
2065
+ "byteOffset": 29531136
2066
+ },
2067
+ {
2068
+ "name": "roberta.encoder.layer.9.attention.self.key.weight",
2069
+ "shape": [
2070
+ 768,
2071
+ 768
2072
+ ],
2073
+ "dtype": "float32",
2074
+ "format": "f32-to-bf16",
2075
+ "nbytes": 1179648,
2076
+ "byteOffset": 29532672
2077
+ },
2078
+ {
2079
+ "name": "roberta.encoder.layer.9.attention.self.query.bias",
2080
+ "shape": [
2081
+ 768
2082
+ ],
2083
+ "dtype": "float32",
2084
+ "format": "f32-to-bf16",
2085
+ "nbytes": 1536,
2086
+ "byteOffset": 30712320
2087
+ },
2088
+ {
2089
+ "name": "roberta.encoder.layer.9.attention.self.query.weight",
2090
+ "shape": [
2091
+ 768,
2092
+ 768
2093
+ ],
2094
+ "dtype": "float32",
2095
+ "format": "f32-to-bf16",
2096
+ "nbytes": 1179648,
2097
+ "byteOffset": 30713856
2098
+ },
2099
+ {
2100
+ "name": "roberta.encoder.layer.9.attention.self.value.bias",
2101
+ "shape": [
2102
+ 768
2103
+ ],
2104
+ "dtype": "float32",
2105
+ "format": "f32-to-bf16",
2106
+ "nbytes": 1536,
2107
+ "byteOffset": 31893504
2108
+ },
2109
+ {
2110
+ "name": "roberta.encoder.layer.9.attention.self.value.weight",
2111
+ "shape": [
2112
+ 768,
2113
+ 768
2114
+ ],
2115
+ "dtype": "float32",
2116
+ "format": "f32-to-bf16",
2117
+ "nbytes": 1179648,
2118
+ "byteOffset": 31895040
2119
+ },
2120
+ {
2121
+ "name": "roberta.encoder.layer.9.intermediate.dense.bias",
2122
+ "shape": [
2123
+ 3072
2124
+ ],
2125
+ "dtype": "float32",
2126
+ "format": "f32-to-bf16",
2127
+ "nbytes": 6144,
2128
+ "byteOffset": 33074688
2129
+ }
2130
+ ],
2131
+ "md5sum": "16e5fe6cb9c4b8dfe68170b6b4a350be"
2132
+ },
2133
+ {
2134
+ "dataPath": "params_shard_6.bin",
2135
+ "format": "raw-shard",
2136
+ "nbytes": 9441792,
2137
+ "records": [
2138
+ {
2139
+ "name": "roberta.encoder.layer.9.intermediate.dense.weight",
2140
+ "shape": [
2141
+ 3072,
2142
+ 768
2143
+ ],
2144
+ "dtype": "float32",
2145
+ "format": "f32-to-bf16",
2146
+ "nbytes": 4718592,
2147
+ "byteOffset": 0
2148
+ },
2149
+ {
2150
+ "name": "roberta.encoder.layer.9.output.LayerNorm.bias",
2151
+ "shape": [
2152
+ 768
2153
+ ],
2154
+ "dtype": "float32",
2155
+ "format": "f32-to-bf16",
2156
+ "nbytes": 1536,
2157
+ "byteOffset": 4718592
2158
+ },
2159
+ {
2160
+ "name": "roberta.encoder.layer.9.output.LayerNorm.weight",
2161
+ "shape": [
2162
+ 768
2163
+ ],
2164
+ "dtype": "float32",
2165
+ "format": "f32-to-bf16",
2166
+ "nbytes": 1536,
2167
+ "byteOffset": 4720128
2168
+ },
2169
+ {
2170
+ "name": "roberta.encoder.layer.9.output.dense.bias",
2171
+ "shape": [
2172
+ 768
2173
+ ],
2174
+ "dtype": "float32",
2175
+ "format": "f32-to-bf16",
2176
+ "nbytes": 1536,
2177
+ "byteOffset": 4721664
2178
+ },
2179
+ {
2180
+ "name": "roberta.encoder.layer.9.output.dense.weight",
2181
+ "shape": [
2182
+ 768,
2183
+ 3072
2184
+ ],
2185
+ "dtype": "float32",
2186
+ "format": "f32-to-bf16",
2187
+ "nbytes": 4718592,
2188
+ "byteOffset": 4723200
2189
+ }
2190
+ ],
2191
+ "md5sum": "5882c18b7fb291d522cc147ec14e7014"
2192
+ }
2193
+ ]
2194
+ }
params_shard_0.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:963b60a4d19c3cdf6c82bf9a4cd4a5a8313dcdd996bec00c72ac4193874c4437
3
+ size 77207040
params_shard_1.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a80070c494b1a5ce873893843a7f545a7c3d3731f1a9c92066370e47889562a6
3
+ size 32699912
params_shard_2.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:712cd523528510284bb6fc4eb7eff3d90dff7d765d9aecfdc29d795c6381b29e
3
+ size 31899648
params_shard_3.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0c00d4b5c8e00f182d08da43182cf89325812cfb02e70ff0c90c2526d808124
3
+ size 33074688
params_shard_4.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d759e01730ec1cac1eb6f2adb088af5617f3412a9ff2ce12a0c26df0960e9b8
3
+ size 33074688
params_shard_5.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e83e06d55a6bea871a641c4680a480fffeedb11c5bd4aab4615fcc867566c41a
3
+ size 33080832
params_shard_6.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99dd464ff7ad4e7c87d5629569a34eab302d1502dbf0626af6968787c2bb6a26
3
+ size 9441792
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "mask_token": "<mask>",
51
+ "model_max_length": 512,
52
+ "pad_token": "<pad>",
53
+ "sep_token": "</s>",
54
+ "tokenizer_class": "RobertaTokenizer",
55
+ "unk_token": "<unk>"
56
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff