Added Model
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- README.md +142 -0
- data/lang_phone/L.pt +3 -0
- data/lang_phone/L_disambig.pt +3 -0
- data/lang_phone/Linv.pt +3 -0
- data/lang_phone/lexicon.txt +32 -0
- data/lang_phone/lexicon_disambig.txt +32 -0
- data/lang_phone/tokens.txt +34 -0
- data/lang_phone/words.txt +36 -0
- exp/cpu_jit.pt +3 -0
- exp/decoder_jit_trace-pnnx.pt +3 -0
- exp/decoder_jit_trace.pt +3 -0
- exp/encoder_jit_trace-pnnx.pt +3 -0
- exp/encoder_jit_trace.pt +3 -0
- exp/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
- exp/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
- exp/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
- exp/fast_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model-2023-06-21-09-40-15 +45 -0
- exp/fast_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
- exp/fast_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
- exp/fast_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
- exp/fast_beam_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
- exp/fast_beam_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
- exp/fast_beam_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
- exp/greedy_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
- exp/greedy_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
- exp/greedy_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
- exp/greedy_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model-2023-06-21-09-39-14 +39 -0
- exp/greedy_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
- exp/greedy_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
- exp/greedy_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
- exp/greedy_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
- exp/greedy_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
- exp/greedy_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
- exp/joiner_jit_trace-pnnx.pt +3 -0
- exp/joiner_jit_trace.pt +3 -0
- exp/modified_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
- exp/modified_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
- exp/modified_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
- exp/modified_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model-2023-06-21-09-41-35 +55 -0
- exp/modified_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
- exp/modified_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
- exp/modified_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
- exp/modified_beam_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
- exp/modified_beam_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
- exp/modified_beam_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
- exp/pretrained.pt +3 -0
- exp/streaming/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
- exp/streaming/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
- exp/streaming/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
- exp/streaming/fast_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model-2023-06-21-10-04-38 +136 -0
README.md
CHANGED
@@ -1,3 +1,145 @@
|
|
1 |
---
|
|
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language: id
|
3 |
license: apache-2.0
|
4 |
+
tags:
|
5 |
+
- icefall
|
6 |
+
- phoneme-recognition
|
7 |
+
- automatic-speech-recognition
|
8 |
+
datasets:
|
9 |
+
- mozilla-foundation/common_voice_13_0
|
10 |
+
- indonesian-nlp/librivox-indonesia
|
11 |
+
- google/fleurs
|
12 |
---
|
13 |
+
|
14 |
+
# Pruned Stateless Zipformer RNN-T Streaming ID
|
15 |
+
|
16 |
+
Pruned Stateless Zipformer RNN-T Streaming ID is an automatic speech recognition model trained on the following datasets:
|
17 |
+
|
18 |
+
- [Common Voice ID](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0)
|
19 |
+
- [LibriVox Indonesia](https://huggingface.co/datasets/indonesian-nlp/librivox-indonesia)
|
20 |
+
- [FLEURS ID](https://huggingface.co/datasets/google/fleurs)
|
21 |
+
|
22 |
+
Instead of being trained to predict sequences of words, this model was trained to predict sequence of phonemes, e.g. `['p', 'ə', 'r', 'b', 'u', 'a', 't', 'a', 'n', 'ɲ', 'a']`. Therefore, the model's [vocabulary](https://huggingface.co/bookbot/pruned-transducer-stateless7-streaming-id/blob/main/data/lang_phone/tokens.txt) contains the different IPA phonemes found in [g2p ID](https://github.com/bookbot-kids/g2p_id).
|
23 |
+
|
24 |
+
This model was trained using [icefall](https://github.com/k2-fsa/icefall) framework. All training was done on a Google Cloud Engine VM with a Tesla A100 GPU. All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/bookbot/pruned-transducer-stateless7-streaming-id/tree/main) tab, as well as the [Training metrics](https://huggingface.co/bookbot/pruned-transducer-stateless7-streaming-id/tensorboard) logged via Tensorboard.
|
25 |
+
|
26 |
+
## Evaluation Results
|
27 |
+
|
28 |
+
### Simulated Streaming
|
29 |
+
|
30 |
+
```sh
|
31 |
+
for m in greedy_search fast_beam_search modified_beam_search; do
|
32 |
+
./pruned_transducer_stateless7_streaming/decode.py \
|
33 |
+
--epoch 30 \
|
34 |
+
--avg 9 \
|
35 |
+
--exp-dir ./pruned_transducer_stateless7_streaming/exp \
|
36 |
+
--max-duration 600 \
|
37 |
+
--decode-chunk-len 32 \
|
38 |
+
--decoding-method $m
|
39 |
+
done
|
40 |
+
```
|
41 |
+
|
42 |
+
The model achieves the following phoneme error rates on the different test sets:
|
43 |
+
|
44 |
+
| Decoding | LibriVox | FLEURS | Common Voice |
|
45 |
+
| -------------------- | :------: | :----: | :----------: |
|
46 |
+
| Greedy Search | 4.87% | 11.45% | 14.97% |
|
47 |
+
| Modified Beam Search | 4.71% | 11.25% | 14.31% |
|
48 |
+
| Fast Beam Search | 4.85% | 12.55% | 14.89% |
|
49 |
+
|
50 |
+
### Chunk-wise Streaming
|
51 |
+
|
52 |
+
```sh
|
53 |
+
for m in greedy_search fast_beam_search modified_beam_search; do
|
54 |
+
./pruned_transducer_stateless7_streaming/streaming_decode.py \
|
55 |
+
--epoch 30 \
|
56 |
+
--avg 9 \
|
57 |
+
--exp-dir ./pruned_transducer_stateless7_streaming/exp \
|
58 |
+
--decoding-method $m \
|
59 |
+
--decode-chunk-len 32 \
|
60 |
+
--num-decode-streams 1500
|
61 |
+
done
|
62 |
+
```
|
63 |
+
|
64 |
+
The model achieves the following phoneme error rates on the different test sets:
|
65 |
+
|
66 |
+
| Decoding | LibriVox | FLEURS | Common Voice |
|
67 |
+
| -------------------- | :------: | :----: | :----------: |
|
68 |
+
| Greedy Search | 5.12% | 12.74% | 15.78% |
|
69 |
+
| Modified Beam Search | 4.78% | 11.83% | 14.54% |
|
70 |
+
| Fast Beam Search | 4.81% | 12.93% | 14.96% |
|
71 |
+
|
72 |
+
## Usage
|
73 |
+
|
74 |
+
### Download Pre-trained Model
|
75 |
+
|
76 |
+
```sh
|
77 |
+
cd egs/bookbot/ASR
|
78 |
+
mkdir tmp
|
79 |
+
cd tmp
|
80 |
+
git lfs install
|
81 |
+
git clone https://huggingface.co/bookbot/pruned-transducer-stateless7-streaming-id
|
82 |
+
```
|
83 |
+
|
84 |
+
### Inference
|
85 |
+
|
86 |
+
To decode with greedy search, run:
|
87 |
+
|
88 |
+
```sh
|
89 |
+
./pruned_transducer_stateless7_streaming/jit_pretrained.py \
|
90 |
+
--nn-model-filename ./tmp/pruned-transducer-stateless7-streaming-id/exp/cpu_jit.pt \
|
91 |
+
--lang-dir ./tmp/pruned-transducer-stateless7-streaming-id/data/lang_phone \
|
92 |
+
./tmp/pruned-transducer-stateless7-streaming-id/test_waves/sample1.wav
|
93 |
+
```
|
94 |
+
|
95 |
+
<details>
|
96 |
+
<summary>Decoding Output</summary>
|
97 |
+
|
98 |
+
```
|
99 |
+
2023-06-21 10:19:18,563 INFO [jit_pretrained.py:217] device: cpu
|
100 |
+
2023-06-21 10:19:19,231 INFO [lexicon.py:168] Loading pre-compiled tmp/pruned-transducer-stateless7-streaming-id/data/lang_phone/Linv.pt
|
101 |
+
2023-06-21 10:19:19,232 INFO [jit_pretrained.py:228] Constructing Fbank computer
|
102 |
+
2023-06-21 10:19:19,233 INFO [jit_pretrained.py:238] Reading sound files: ['./tmp/pruned-transducer-stateless7-streaming-id/test_waves/sample1.wav']
|
103 |
+
2023-06-21 10:19:19,234 INFO [jit_pretrained.py:244] Decoding started
|
104 |
+
2023-06-21 10:19:20,090 INFO [jit_pretrained.py:271]
|
105 |
+
./tmp/pruned-transducer-stateless7-streaming-id/test_waves/sample1.wav:
|
106 |
+
p u l a ŋ | s ə k o l a h | p i t ə r i | s a ŋ a t | l a p a r
|
107 |
+
|
108 |
+
|
109 |
+
2023-06-21 10:19:20,090 INFO [jit_pretrained.py:273] Decoding Done
|
110 |
+
```
|
111 |
+
|
112 |
+
</details>
|
113 |
+
|
114 |
+
## Training procedure
|
115 |
+
|
116 |
+
### Install icefall
|
117 |
+
|
118 |
+
```sh
|
119 |
+
git clone https://github.com/bookbot-hive/icefall
|
120 |
+
cd icefall
|
121 |
+
export PYTHONPATH=`pwd`:$PYTHONPATH
|
122 |
+
```
|
123 |
+
|
124 |
+
### Prepare Data
|
125 |
+
|
126 |
+
```sh
|
127 |
+
cd egs/bookbot_id/ASR
|
128 |
+
./prepare.sh
|
129 |
+
```
|
130 |
+
|
131 |
+
### Train
|
132 |
+
|
133 |
+
```sh
|
134 |
+
export CUDA_VISIBLE_DEVICES="0"
|
135 |
+
./pruned_transducer_stateless7_streaming/train.py \
|
136 |
+
--num-epochs 30 \
|
137 |
+
--use-fp16 1 \
|
138 |
+
--max-duration 400
|
139 |
+
```
|
140 |
+
|
141 |
+
## Frameworks
|
142 |
+
|
143 |
+
- [k2](https://github.com/k2-fsa/k2)
|
144 |
+
- [icefall](https://github.com/bookbot-hive/icefall)
|
145 |
+
- [lhotse](https://github.com/bookbot-hive/lhotse)
|
data/lang_phone/L.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:5e67299c15c8faa128dd7317d652619b51f28b431cec64fd3b8338daf9762fc4
|
3 |
+
size 1551
|
data/lang_phone/L_disambig.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:42d1a58e242b3f7799fffda803fa17ada3112ae71be2556665c910051d25a7d7
|
3 |
+
size 1715
|
data/lang_phone/Linv.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:88935261f84d15c344a6adc9ac289b6d58acd18085a6900d5e5124866b5dc0ee
|
3 |
+
size 1627
|
data/lang_phone/lexicon.txt
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
a a
|
2 |
+
b b
|
3 |
+
d d
|
4 |
+
dʒ dʒ
|
5 |
+
e e
|
6 |
+
f f
|
7 |
+
h h
|
8 |
+
i i
|
9 |
+
j j
|
10 |
+
k k
|
11 |
+
l l
|
12 |
+
m m
|
13 |
+
n n
|
14 |
+
o o
|
15 |
+
p p
|
16 |
+
r r
|
17 |
+
s s
|
18 |
+
t t
|
19 |
+
tʃ tʃ
|
20 |
+
u u
|
21 |
+
v v
|
22 |
+
w w
|
23 |
+
x x
|
24 |
+
z z
|
25 |
+
| |
|
26 |
+
ŋ ŋ
|
27 |
+
ə ə
|
28 |
+
ɡ ɡ
|
29 |
+
ɲ ɲ
|
30 |
+
ʃ ʃ
|
31 |
+
ʔ ʔ
|
32 |
+
<UNK> <UNK>
|
data/lang_phone/lexicon_disambig.txt
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
a a
|
2 |
+
b b
|
3 |
+
d d
|
4 |
+
dʒ dʒ
|
5 |
+
e e
|
6 |
+
f f
|
7 |
+
h h
|
8 |
+
i i
|
9 |
+
j j
|
10 |
+
k k
|
11 |
+
l l
|
12 |
+
m m
|
13 |
+
n n
|
14 |
+
o o
|
15 |
+
p p
|
16 |
+
r r
|
17 |
+
s s
|
18 |
+
t t
|
19 |
+
tʃ tʃ
|
20 |
+
u u
|
21 |
+
v v
|
22 |
+
w w
|
23 |
+
x x
|
24 |
+
z z
|
25 |
+
| |
|
26 |
+
ŋ ŋ
|
27 |
+
ə ə
|
28 |
+
ɡ ɡ
|
29 |
+
ɲ ɲ
|
30 |
+
ʃ ʃ
|
31 |
+
ʔ ʔ
|
32 |
+
<UNK> <UNK>
|
data/lang_phone/tokens.txt
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<eps> 0
|
2 |
+
ɡ 1
|
3 |
+
o 2
|
4 |
+
d 3
|
5 |
+
ʃ 4
|
6 |
+
v 5
|
7 |
+
t 6
|
8 |
+
<UNK> 7
|
9 |
+
x 8
|
10 |
+
r 9
|
11 |
+
ʔ 10
|
12 |
+
b 11
|
13 |
+
s 12
|
14 |
+
p 13
|
15 |
+
i 14
|
16 |
+
dʒ 15
|
17 |
+
| 16
|
18 |
+
ə 17
|
19 |
+
z 18
|
20 |
+
f 19
|
21 |
+
n 20
|
22 |
+
m 21
|
23 |
+
ɲ 22
|
24 |
+
tʃ 23
|
25 |
+
ŋ 24
|
26 |
+
k 25
|
27 |
+
j 26
|
28 |
+
l 27
|
29 |
+
h 28
|
30 |
+
w 29
|
31 |
+
a 30
|
32 |
+
u 31
|
33 |
+
e 32
|
34 |
+
#0 33
|
data/lang_phone/words.txt
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<eps> 0
|
2 |
+
<UNK> 1
|
3 |
+
a 2
|
4 |
+
b 3
|
5 |
+
d 4
|
6 |
+
dʒ 5
|
7 |
+
e 6
|
8 |
+
f 7
|
9 |
+
h 8
|
10 |
+
i 9
|
11 |
+
j 10
|
12 |
+
k 11
|
13 |
+
l 12
|
14 |
+
m 13
|
15 |
+
n 14
|
16 |
+
o 15
|
17 |
+
p 16
|
18 |
+
r 17
|
19 |
+
s 18
|
20 |
+
t 19
|
21 |
+
tʃ 20
|
22 |
+
u 21
|
23 |
+
v 22
|
24 |
+
w 23
|
25 |
+
x 24
|
26 |
+
z 25
|
27 |
+
| 26
|
28 |
+
ŋ 27
|
29 |
+
ə 28
|
30 |
+
ɡ 29
|
31 |
+
ɲ 30
|
32 |
+
ʃ 31
|
33 |
+
ʔ 32
|
34 |
+
#0 33
|
35 |
+
<s> 34
|
36 |
+
</s> 35
|
exp/cpu_jit.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1584f55881aead89f3bdd8d7dab007479a61e5cbf4eff83a4b95a68eba2b9160
|
3 |
+
size 354961726
|
exp/decoder_jit_trace-pnnx.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:94e7e3bb9002ab8808c9d194a0cea7bb8bf1526f6ca0d8dcf9dcfd52229e4709
|
3 |
+
size 89773
|
exp/decoder_jit_trace.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1a924947cdac6dd4d74cea0d5976637ed57c01950c543ba77f9417d3e5f35e23
|
3 |
+
size 89590
|
exp/encoder_jit_trace-pnnx.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3f3e371e7c9fdfb44343e037fbfe7e4e1404a3d8e421ac17ddacbb58e3983a9d
|
3 |
+
size 278155657
|
exp/encoder_jit_trace.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d25b36b251e67544a850505a1655f3e26e1f309e43bc51f5ee10a7c510125ed7
|
3 |
+
size 354193226
|
exp/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/fast_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model-2023-06-21-09-40-15
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2023-06-21 09:40:15,150 INFO [decode.py:654] Decoding started
|
2 |
+
2023-06-21 09:40:15,151 INFO [decode.py:660] Device: cuda:0
|
3 |
+
2023-06-21 09:40:15,152 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
4 |
+
2023-06-21 09:40:15,155 INFO [decode.py:668] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9426c9f730820d291f5dcb06be337662595fa7b4', 'k2-git-date': 'Sun Feb 5 17:35:01 2023', 'lhotse-version': '1.15.0.dev+git.00d3e36.clean', 'torch-version': '1.13.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'd3f5d01-dirty', 'icefall-git-date': 'Wed May 31 04:15:45 2023', 'icefall-path': '/root/icefall', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/root/lhotse/lhotse/__init__.py', 'hostname': 'bookbot-k2', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 9, 'use_averaged_model': True, 'exp_dir': PosixPath('pruned_transducer_stateless7_streaming/exp'), 'lang_dir': 'data/lang_phone', 'decoding_method': 'fast_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('pruned_transducer_stateless7_streaming/exp/fast_beam_search'), 'suffix': 'epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model', 'blank_id': 0, 'unk_id': 7, 'vocab_size': 33}
|
5 |
+
2023-06-21 09:40:15,155 INFO [decode.py:670] About to create model
|
6 |
+
2023-06-21 09:40:15,733 INFO [zipformer.py:405] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
|
7 |
+
2023-06-21 09:40:15,737 INFO [decode.py:741] Calculating the averaged model over epoch range from 21 (excluded) to 30
|
8 |
+
2023-06-21 09:40:19,291 INFO [decode.py:774] Number of model parameters: 69471350
|
9 |
+
2023-06-21 09:40:19,291 INFO [multidataset.py:122] About to get LibriVox test cuts
|
10 |
+
2023-06-21 09:40:19,291 INFO [multidataset.py:124] Loading LibriVox in lazy mode
|
11 |
+
2023-06-21 09:40:19,292 INFO [multidataset.py:133] About to get FLEURS test cuts
|
12 |
+
2023-06-21 09:40:19,292 INFO [multidataset.py:135] Loading FLEURS in lazy mode
|
13 |
+
2023-06-21 09:40:19,292 INFO [multidataset.py:144] About to get Common Voice test cuts
|
14 |
+
2023-06-21 09:40:19,292 INFO [multidataset.py:146] Loading Common Voice in lazy mode
|
15 |
+
2023-06-21 09:40:22,208 INFO [decode.py:565] batch 0/?, cuts processed until now is 44
|
16 |
+
2023-06-21 09:40:28,732 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/fast_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
|
17 |
+
2023-06-21 09:40:28,779 INFO [utils.py:561] [test-librivox-beam_20.0_max_contexts_8_max_states_64] %WER 4.85% [1773 / 36594, 295 ins, 904 del, 574 sub ]
|
18 |
+
2023-06-21 09:40:28,860 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
|
19 |
+
2023-06-21 09:40:28,860 INFO [decode.py:604]
|
20 |
+
For test-librivox, WER of different settings are:
|
21 |
+
beam_20.0_max_contexts_8_max_states_64 4.85 best for test-librivox
|
22 |
+
|
23 |
+
2023-06-21 09:40:30,839 INFO [decode.py:565] batch 0/?, cuts processed until now is 38
|
24 |
+
2023-06-21 09:41:00,055 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/fast_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
|
25 |
+
2023-06-21 09:41:00,146 INFO [utils.py:561] [test-fleurs-beam_20.0_max_contexts_8_max_states_64] %WER 12.55% [11748 / 93580, 1672 ins, 5414 del, 4662 sub ]
|
26 |
+
2023-06-21 09:41:00,362 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
|
27 |
+
2023-06-21 09:41:00,362 INFO [decode.py:604]
|
28 |
+
For test-fleurs, WER of different settings are:
|
29 |
+
beam_20.0_max_contexts_8_max_states_64 12.55 best for test-fleurs
|
30 |
+
|
31 |
+
2023-06-21 09:41:01,414 INFO [zipformer.py:2441] attn_weights_entropy = tensor([1.1632, 1.0353, 1.2741, 0.9735, 1.1847, 1.2830, 1.1450, 1.0967],
|
32 |
+
device='cuda:0'), covar=tensor([0.0547, 0.0601, 0.0483, 0.0755, 0.0373, 0.0368, 0.0490, 0.0569],
|
33 |
+
device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0019, 0.0019, 0.0021, 0.0018, 0.0017, 0.0019, 0.0019],
|
34 |
+
device='cuda:0'), out_proj_covar=tensor([1.3702e-05, 1.4294e-05, 1.3432e-05, 1.4389e-05, 1.2265e-05, 1.4168e-05,
|
35 |
+
1.2323e-05, 1.3747e-05], device='cuda:0')
|
36 |
+
2023-06-21 09:41:02,049 INFO [decode.py:565] batch 0/?, cuts processed until now is 121
|
37 |
+
2023-06-21 09:41:22,562 INFO [decode.py:565] batch 20/?, cuts processed until now is 2809
|
38 |
+
2023-06-21 09:41:31,340 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/fast_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
|
39 |
+
2023-06-21 09:41:31,464 INFO [utils.py:561] [test-commonvoice-beam_20.0_max_contexts_8_max_states_64] %WER 14.89% [19770 / 132787, 2851 ins, 9210 del, 7709 sub ]
|
40 |
+
2023-06-21 09:41:31,757 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
|
41 |
+
2023-06-21 09:41:31,757 INFO [decode.py:604]
|
42 |
+
For test-commonvoice, WER of different settings are:
|
43 |
+
beam_20.0_max_contexts_8_max_states_64 14.89 best for test-commonvoice
|
44 |
+
|
45 |
+
2023-06-21 09:41:31,758 INFO [decode.py:809] Done!
|
exp/fast_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/fast_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/fast_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/fast_beam_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
beam_20.0_max_contexts_8_max_states_64 14.89
|
exp/fast_beam_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
beam_20.0_max_contexts_8_max_states_64 12.55
|
exp/fast_beam_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
beam_20.0_max_contexts_8_max_states_64 4.85
|
exp/greedy_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/greedy_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/greedy_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/greedy_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model-2023-06-21-09-39-14
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2023-06-21 09:39:14,130 INFO [decode.py:654] Decoding started
|
2 |
+
2023-06-21 09:39:14,130 INFO [decode.py:660] Device: cuda:0
|
3 |
+
2023-06-21 09:39:14,131 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
4 |
+
2023-06-21 09:39:14,134 INFO [decode.py:668] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9426c9f730820d291f5dcb06be337662595fa7b4', 'k2-git-date': 'Sun Feb 5 17:35:01 2023', 'lhotse-version': '1.15.0.dev+git.00d3e36.clean', 'torch-version': '1.13.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'd3f5d01-dirty', 'icefall-git-date': 'Wed May 31 04:15:45 2023', 'icefall-path': '/root/icefall', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/root/lhotse/lhotse/__init__.py', 'hostname': 'bookbot-k2', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 9, 'use_averaged_model': True, 'exp_dir': PosixPath('pruned_transducer_stateless7_streaming/exp'), 'lang_dir': 'data/lang_phone', 'decoding_method': 'greedy_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('pruned_transducer_stateless7_streaming/exp/greedy_search'), 'suffix': 'epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model', 'blank_id': 0, 'unk_id': 7, 'vocab_size': 33}
|
5 |
+
2023-06-21 09:39:14,135 INFO [decode.py:670] About to create model
|
6 |
+
2023-06-21 09:39:14,915 INFO [zipformer.py:405] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
|
7 |
+
2023-06-21 09:39:14,921 INFO [decode.py:741] Calculating the averaged model over epoch range from 21 (excluded) to 30
|
8 |
+
2023-06-21 09:39:20,667 INFO [decode.py:774] Number of model parameters: 69471350
|
9 |
+
2023-06-21 09:39:20,668 INFO [multidataset.py:122] About to get LibriVox test cuts
|
10 |
+
2023-06-21 09:39:20,668 INFO [multidataset.py:124] Loading LibriVox in lazy mode
|
11 |
+
2023-06-21 09:39:20,671 INFO [multidataset.py:133] About to get FLEURS test cuts
|
12 |
+
2023-06-21 09:39:20,671 INFO [multidataset.py:135] Loading FLEURS in lazy mode
|
13 |
+
2023-06-21 09:39:20,673 INFO [multidataset.py:144] About to get Common Voice test cuts
|
14 |
+
2023-06-21 09:39:20,673 INFO [multidataset.py:146] Loading Common Voice in lazy mode
|
15 |
+
2023-06-21 09:39:24,965 INFO [decode.py:565] batch 0/?, cuts processed until now is 44
|
16 |
+
2023-06-21 09:39:29,616 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/greedy_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
|
17 |
+
2023-06-21 09:39:29,662 INFO [utils.py:561] [test-librivox-greedy_search] %WER 4.87% [1783 / 36594, 317 ins, 868 del, 598 sub ]
|
18 |
+
2023-06-21 09:39:29,742 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/greedy_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
|
19 |
+
2023-06-21 09:39:29,742 INFO [decode.py:604]
|
20 |
+
For test-librivox, WER of different settings are:
|
21 |
+
greedy_search 4.87 best for test-librivox
|
22 |
+
|
23 |
+
2023-06-21 09:39:31,511 INFO [decode.py:565] batch 0/?, cuts processed until now is 38
|
24 |
+
2023-06-21 09:39:50,011 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/greedy_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
|
25 |
+
2023-06-21 09:39:50,138 INFO [utils.py:561] [test-fleurs-greedy_search] %WER 11.45% [10718 / 93580, 1850 ins, 3733 del, 5135 sub ]
|
26 |
+
2023-06-21 09:39:50,453 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/greedy_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
|
27 |
+
2023-06-21 09:39:50,453 INFO [decode.py:604]
|
28 |
+
For test-fleurs, WER of different settings are:
|
29 |
+
greedy_search 11.45 best for test-fleurs
|
30 |
+
|
31 |
+
2023-06-21 09:39:52,522 INFO [decode.py:565] batch 0/?, cuts processed until now is 121
|
32 |
+
2023-06-21 09:40:11,369 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/greedy_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
|
33 |
+
2023-06-21 09:40:11,489 INFO [utils.py:561] [test-commonvoice-greedy_search] %WER 14.97% [19873 / 132787, 3792 ins, 7589 del, 8492 sub ]
|
34 |
+
2023-06-21 09:40:11,787 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/greedy_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
|
35 |
+
2023-06-21 09:40:11,788 INFO [decode.py:604]
|
36 |
+
For test-commonvoice, WER of different settings are:
|
37 |
+
greedy_search 14.97 best for test-commonvoice
|
38 |
+
|
39 |
+
2023-06-21 09:40:11,788 INFO [decode.py:809] Done!
|
exp/greedy_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/greedy_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/greedy_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/greedy_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
greedy_search 14.97
|
exp/greedy_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
greedy_search 11.45
|
exp/greedy_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
greedy_search 4.87
|
exp/joiner_jit_trace-pnnx.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b2772b338d03c7ebea5247337cf50fffb91a7950c351622a320ad4fc38b393ec
|
3 |
+
size 1914564
|
exp/joiner_jit_trace.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f1ffb093a638ecdd5a015aff5d9c6ae62a7dddc815e18dd46ca19a46976367ce
|
3 |
+
size 1914479
|
exp/modified_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/modified_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/modified_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/modified_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model-2023-06-21-09-41-35
ADDED
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2023-06-21 09:41:35,276 INFO [decode.py:654] Decoding started
|
2 |
+
2023-06-21 09:41:35,276 INFO [decode.py:660] Device: cuda:0
|
3 |
+
2023-06-21 09:41:35,277 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
4 |
+
2023-06-21 09:41:35,280 INFO [decode.py:668] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9426c9f730820d291f5dcb06be337662595fa7b4', 'k2-git-date': 'Sun Feb 5 17:35:01 2023', 'lhotse-version': '1.15.0.dev+git.00d3e36.clean', 'torch-version': '1.13.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'd3f5d01-dirty', 'icefall-git-date': 'Wed May 31 04:15:45 2023', 'icefall-path': '/root/icefall', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/root/lhotse/lhotse/__init__.py', 'hostname': 'bookbot-k2', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 9, 'use_averaged_model': True, 'exp_dir': PosixPath('pruned_transducer_stateless7_streaming/exp'), 'lang_dir': 'data/lang_phone', 'decoding_method': 'modified_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('pruned_transducer_stateless7_streaming/exp/modified_beam_search'), 'suffix': 'epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model', 'blank_id': 0, 'unk_id': 7, 'vocab_size': 33}
|
5 |
+
2023-06-21 09:41:35,281 INFO [decode.py:670] About to create model
|
6 |
+
2023-06-21 09:41:35,838 INFO [zipformer.py:405] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
|
7 |
+
2023-06-21 09:41:35,843 INFO [decode.py:741] Calculating the averaged model over epoch range from 21 (excluded) to 30
|
8 |
+
2023-06-21 09:41:39,380 INFO [decode.py:774] Number of model parameters: 69471350
|
9 |
+
2023-06-21 09:41:39,380 INFO [multidataset.py:122] About to get LibriVox test cuts
|
10 |
+
2023-06-21 09:41:39,380 INFO [multidataset.py:124] Loading LibriVox in lazy mode
|
11 |
+
2023-06-21 09:41:39,381 INFO [multidataset.py:133] About to get FLEURS test cuts
|
12 |
+
2023-06-21 09:41:39,381 INFO [multidataset.py:135] Loading FLEURS in lazy mode
|
13 |
+
2023-06-21 09:41:39,381 INFO [multidataset.py:144] About to get Common Voice test cuts
|
14 |
+
2023-06-21 09:41:39,381 INFO [multidataset.py:146] Loading Common Voice in lazy mode
|
15 |
+
2023-06-21 09:41:43,886 INFO [decode.py:565] batch 0/?, cuts processed until now is 44
|
16 |
+
2023-06-21 09:41:46,269 INFO [zipformer.py:2441] attn_weights_entropy = tensor([1.3801, 1.7156, 1.0930, 1.5632, 1.3604, 1.3437, 1.7393, 0.6970],
|
17 |
+
device='cuda:0'), covar=tensor([0.4497, 0.2012, 0.2669, 0.2689, 0.2707, 0.2909, 0.1440, 0.5122],
|
18 |
+
device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0053, 0.0059, 0.0067, 0.0065, 0.0064, 0.0051, 0.0077],
|
19 |
+
device='cuda:0'), out_proj_covar=tensor([5.5637e-05, 3.5992e-05, 4.1115e-05, 4.8266e-05, 4.8700e-05, 4.4501e-05,
|
20 |
+
3.4417e-05, 7.3250e-05], device='cuda:0')
|
21 |
+
2023-06-21 09:42:00,403 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/modified_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
|
22 |
+
2023-06-21 09:42:00,449 INFO [utils.py:561] [test-librivox-beam_size_4] %WER 4.71% [1725 / 36594, 309 ins, 836 del, 580 sub ]
|
23 |
+
2023-06-21 09:42:00,531 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/modified_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
|
24 |
+
2023-06-21 09:42:00,531 INFO [decode.py:604]
|
25 |
+
For test-librivox, WER of different settings are:
|
26 |
+
beam_size_4 4.71 best for test-librivox
|
27 |
+
|
28 |
+
2023-06-21 09:42:01,464 INFO [zipformer.py:2441] attn_weights_entropy = tensor([2.1911, 1.2934, 2.0949, 2.2245, 2.1813, 2.1569, 1.7841, 1.7188],
|
29 |
+
device='cuda:0'), covar=tensor([0.1696, 0.4060, 0.1661, 0.1975, 0.1970, 0.2132, 0.1748, 0.3224],
|
30 |
+
device='cuda:0'), in_proj_covar=tensor([0.0029, 0.0040, 0.0028, 0.0028, 0.0029, 0.0030, 0.0027, 0.0034],
|
31 |
+
device='cuda:0'), out_proj_covar=tensor([1.8266e-05, 3.2097e-05, 1.7461e-05, 1.6755e-05, 1.8651e-05, 1.9838e-05,
|
32 |
+
1.5794e-05, 2.3433e-05], device='cuda:0')
|
33 |
+
2023-06-21 09:42:04,999 INFO [decode.py:565] batch 0/?, cuts processed until now is 38
|
34 |
+
2023-06-21 09:43:09,460 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/modified_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
|
35 |
+
2023-06-21 09:43:09,552 INFO [utils.py:561] [test-fleurs-beam_size_4] %WER 11.25% [10525 / 93580, 1811 ins, 3811 del, 4903 sub ]
|
36 |
+
2023-06-21 09:43:09,853 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/modified_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
|
37 |
+
2023-06-21 09:43:09,853 INFO [decode.py:604]
|
38 |
+
For test-fleurs, WER of different settings are:
|
39 |
+
beam_size_4 11.25 best for test-fleurs
|
40 |
+
|
41 |
+
2023-06-21 09:43:14,023 INFO [decode.py:565] batch 0/?, cuts processed until now is 121
|
42 |
+
2023-06-21 09:43:47,394 INFO [zipformer.py:2441] attn_weights_entropy = tensor([2.5738, 2.5492, 3.0284, 2.4510, 1.3782, 3.0004, 2.8027, 1.4081],
|
43 |
+
device='cuda:0'), covar=tensor([0.1153, 0.1301, 0.0459, 0.0990, 0.4808, 0.0525, 0.0757, 0.4425],
|
44 |
+
device='cuda:0'), in_proj_covar=tensor([0.0071, 0.0071, 0.0055, 0.0070, 0.0106, 0.0057, 0.0058, 0.0105],
|
45 |
+
device='cuda:0'), out_proj_covar=tensor([5.9638e-05, 6.0235e-05, 4.2007e-05, 5.4275e-05, 1.0845e-04, 4.2491e-05,
|
46 |
+
4.5487e-05, 9.8369e-05], device='cuda:0')
|
47 |
+
2023-06-21 09:44:30,935 INFO [decode.py:565] batch 20/?, cuts processed until now is 2809
|
48 |
+
2023-06-21 09:44:57,467 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/modified_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
|
49 |
+
2023-06-21 09:44:57,589 INFO [utils.py:561] [test-commonvoice-beam_size_4] %WER 14.31% [19002 / 132787, 3318 ins, 7575 del, 8109 sub ]
|
50 |
+
2023-06-21 09:44:57,887 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/modified_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
|
51 |
+
2023-06-21 09:44:57,888 INFO [decode.py:604]
|
52 |
+
For test-commonvoice, WER of different settings are:
|
53 |
+
beam_size_4 14.31 best for test-commonvoice
|
54 |
+
|
55 |
+
2023-06-21 09:44:57,888 INFO [decode.py:809] Done!
|
exp/modified_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/modified_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/modified_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/modified_beam_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
beam_size_4 14.31
|
exp/modified_beam_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
beam_size_4 11.25
|
exp/modified_beam_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
settings WER
|
2 |
+
beam_size_4 4.71
|
exp/pretrained.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d7fb8734cd4c8edd2c360ad93343bbbb755b3195eb27e2871e37dc7be6293a4f
|
3 |
+
size 278176561
|
exp/streaming/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/streaming/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/streaming/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
exp/streaming/fast_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model-2023-06-21-10-04-38
ADDED
@@ -0,0 +1,136 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2023-06-21 10:04:38,023 INFO [streaming_decode.py:483] Decoding started
|
2 |
+
2023-06-21 10:04:38,023 INFO [streaming_decode.py:489] Device: cuda:0
|
3 |
+
2023-06-21 10:04:38,024 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
4 |
+
2023-06-21 10:04:38,027 INFO [streaming_decode.py:497] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9426c9f730820d291f5dcb06be337662595fa7b4', 'k2-git-date': 'Sun Feb 5 17:35:01 2023', 'lhotse-version': '1.15.0.dev+git.00d3e36.clean', 'torch-version': '1.13.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'd3f5d01-dirty', 'icefall-git-date': 'Wed May 31 04:15:45 2023', 'icefall-path': '/root/icefall', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/root/lhotse/lhotse/__init__.py', 'hostname': 'bookbot-k2', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 9, 'use_averaged_model': True, 'exp_dir': PosixPath('pruned_transducer_stateless7_streaming/exp'), 'lang_dir': 'data/lang_phone', 'decoding_method': 'fast_beam_search', 'num_active_paths': 4, 'beam': 4, 'max_contexts': 4, 'max_states': 32, 'context_size': 2, 'num_decode_streams': 1500, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search'), 'suffix': 'epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model', 'blank_id': 0, 'unk_id': 7, 'vocab_size': 33}
|
5 |
+
2023-06-21 10:04:38,027 INFO [streaming_decode.py:499] About to create model
|
6 |
+
2023-06-21 10:04:38,604 INFO [zipformer.py:405] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
|
7 |
+
2023-06-21 10:04:38,608 INFO [streaming_decode.py:566] Calculating the averaged model over epoch range from 21 (excluded) to 30
|
8 |
+
2023-06-21 10:04:42,203 INFO [streaming_decode.py:588] Number of model parameters: 69471350
|
9 |
+
2023-06-21 10:04:42,204 INFO [multidataset.py:122] About to get LibriVox test cuts
|
10 |
+
2023-06-21 10:04:42,204 INFO [multidataset.py:124] Loading LibriVox in lazy mode
|
11 |
+
2023-06-21 10:04:42,204 INFO [multidataset.py:133] About to get FLEURS test cuts
|
12 |
+
2023-06-21 10:04:42,204 INFO [multidataset.py:135] Loading FLEURS in lazy mode
|
13 |
+
2023-06-21 10:04:42,205 INFO [multidataset.py:144] About to get Common Voice test cuts
|
14 |
+
2023-06-21 10:04:42,205 INFO [multidataset.py:146] Loading Common Voice in lazy mode
|
15 |
+
2023-06-21 10:04:42,471 INFO [streaming_decode.py:380] Cuts processed until now is 0.
|
16 |
+
2023-06-21 10:04:42,786 INFO [streaming_decode.py:380] Cuts processed until now is 50.
|
17 |
+
2023-06-21 10:04:43,098 INFO [streaming_decode.py:380] Cuts processed until now is 100.
|
18 |
+
2023-06-21 10:04:43,444 INFO [streaming_decode.py:380] Cuts processed until now is 150.
|
19 |
+
2023-06-21 10:04:43,770 INFO [streaming_decode.py:380] Cuts processed until now is 200.
|
20 |
+
2023-06-21 10:04:44,092 INFO [streaming_decode.py:380] Cuts processed until now is 250.
|
21 |
+
2023-06-21 10:04:44,416 INFO [streaming_decode.py:380] Cuts processed until now is 300.
|
22 |
+
2023-06-21 10:04:44,756 INFO [streaming_decode.py:380] Cuts processed until now is 350.
|
23 |
+
2023-06-21 10:04:45,079 INFO [streaming_decode.py:380] Cuts processed until now is 400.
|
24 |
+
2023-06-21 10:04:45,405 INFO [streaming_decode.py:380] Cuts processed until now is 450.
|
25 |
+
2023-06-21 10:04:45,734 INFO [streaming_decode.py:380] Cuts processed until now is 500.
|
26 |
+
2023-06-21 10:04:46,071 INFO [streaming_decode.py:380] Cuts processed until now is 550.
|
27 |
+
2023-06-21 10:04:46,405 INFO [streaming_decode.py:380] Cuts processed until now is 600.
|
28 |
+
2023-06-21 10:04:57,029 INFO [streaming_decode.py:425] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
|
29 |
+
2023-06-21 10:04:57,063 INFO [utils.py:561] [test-librivox-beam_4_max_contexts_4_max_states_32] %WER 4.81% [1759 / 36594, 280 ins, 892 del, 587 sub ]
|
30 |
+
2023-06-21 10:04:57,144 INFO [streaming_decode.py:436] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
|
31 |
+
2023-06-21 10:04:57,145 INFO [streaming_decode.py:450]
|
32 |
+
For test-librivox, WER of different settings are:
|
33 |
+
beam_4_max_contexts_4_max_states_32 4.81 best for test-librivox
|
34 |
+
|
35 |
+
2023-06-21 10:04:57,149 INFO [streaming_decode.py:380] Cuts processed until now is 0.
|
36 |
+
2023-06-21 10:04:57,332 INFO [streaming_decode.py:380] Cuts processed until now is 50.
|
37 |
+
2023-06-21 10:04:57,494 INFO [streaming_decode.py:380] Cuts processed until now is 100.
|
38 |
+
2023-06-21 10:04:57,663 INFO [streaming_decode.py:380] Cuts processed until now is 150.
|
39 |
+
2023-06-21 10:04:57,833 INFO [streaming_decode.py:380] Cuts processed until now is 200.
|
40 |
+
2023-06-21 10:04:58,000 INFO [streaming_decode.py:380] Cuts processed until now is 250.
|
41 |
+
2023-06-21 10:04:58,161 INFO [streaming_decode.py:380] Cuts processed until now is 300.
|
42 |
+
2023-06-21 10:04:58,323 INFO [streaming_decode.py:380] Cuts processed until now is 350.
|
43 |
+
2023-06-21 10:04:58,488 INFO [streaming_decode.py:380] Cuts processed until now is 400.
|
44 |
+
2023-06-21 10:04:58,656 INFO [streaming_decode.py:380] Cuts processed until now is 450.
|
45 |
+
2023-06-21 10:04:58,819 INFO [streaming_decode.py:380] Cuts processed until now is 500.
|
46 |
+
2023-06-21 10:04:58,993 INFO [streaming_decode.py:380] Cuts processed until now is 550.
|
47 |
+
2023-06-21 10:04:59,176 INFO [streaming_decode.py:380] Cuts processed until now is 600.
|
48 |
+
2023-06-21 10:04:59,364 INFO [streaming_decode.py:380] Cuts processed until now is 650.
|
49 |
+
2023-06-21 10:05:34,495 INFO [streaming_decode.py:425] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
|
50 |
+
2023-06-21 10:05:34,590 INFO [utils.py:561] [test-fleurs-beam_4_max_contexts_4_max_states_32] %WER 12.93% [12100 / 93580, 1706 ins, 5594 del, 4800 sub ]
|
51 |
+
2023-06-21 10:05:34,813 INFO [streaming_decode.py:436] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
|
52 |
+
2023-06-21 10:05:34,814 INFO [streaming_decode.py:450]
|
53 |
+
For test-fleurs, WER of different settings are:
|
54 |
+
beam_4_max_contexts_4_max_states_32 12.93 best for test-fleurs
|
55 |
+
|
56 |
+
2023-06-21 10:05:34,820 INFO [streaming_decode.py:380] Cuts processed until now is 0.
|
57 |
+
2023-06-21 10:05:35,059 INFO [streaming_decode.py:380] Cuts processed until now is 50.
|
58 |
+
2023-06-21 10:05:35,308 INFO [streaming_decode.py:380] Cuts processed until now is 100.
|
59 |
+
2023-06-21 10:05:35,583 INFO [streaming_decode.py:380] Cuts processed until now is 150.
|
60 |
+
2023-06-21 10:05:35,829 INFO [streaming_decode.py:380] Cuts processed until now is 200.
|
61 |
+
2023-06-21 10:05:36,082 INFO [streaming_decode.py:380] Cuts processed until now is 250.
|
62 |
+
2023-06-21 10:05:36,315 INFO [streaming_decode.py:380] Cuts processed until now is 300.
|
63 |
+
2023-06-21 10:05:36,537 INFO [streaming_decode.py:380] Cuts processed until now is 350.
|
64 |
+
2023-06-21 10:05:36,797 INFO [streaming_decode.py:380] Cuts processed until now is 400.
|
65 |
+
2023-06-21 10:05:37,028 INFO [streaming_decode.py:380] Cuts processed until now is 450.
|
66 |
+
2023-06-21 10:05:37,263 INFO [streaming_decode.py:380] Cuts processed until now is 500.
|
67 |
+
2023-06-21 10:05:37,499 INFO [streaming_decode.py:380] Cuts processed until now is 550.
|
68 |
+
2023-06-21 10:05:37,720 INFO [streaming_decode.py:380] Cuts processed until now is 600.
|
69 |
+
2023-06-21 10:05:37,959 INFO [streaming_decode.py:380] Cuts processed until now is 650.
|
70 |
+
2023-06-21 10:05:38,182 INFO [streaming_decode.py:380] Cuts processed until now is 700.
|
71 |
+
2023-06-21 10:05:38,406 INFO [streaming_decode.py:380] Cuts processed until now is 750.
|
72 |
+
2023-06-21 10:05:38,664 INFO [streaming_decode.py:380] Cuts processed until now is 800.
|
73 |
+
2023-06-21 10:05:38,913 INFO [streaming_decode.py:380] Cuts processed until now is 850.
|
74 |
+
2023-06-21 10:05:39,251 INFO [streaming_decode.py:380] Cuts processed until now is 900.
|
75 |
+
2023-06-21 10:05:39,493 INFO [streaming_decode.py:380] Cuts processed until now is 950.
|
76 |
+
2023-06-21 10:05:39,726 INFO [streaming_decode.py:380] Cuts processed until now is 1000.
|
77 |
+
2023-06-21 10:05:39,959 INFO [streaming_decode.py:380] Cuts processed until now is 1050.
|
78 |
+
2023-06-21 10:05:40,192 INFO [streaming_decode.py:380] Cuts processed until now is 1100.
|
79 |
+
2023-06-21 10:05:40,436 INFO [streaming_decode.py:380] Cuts processed until now is 1150.
|
80 |
+
2023-06-21 10:05:40,709 INFO [streaming_decode.py:380] Cuts processed until now is 1200.
|
81 |
+
2023-06-21 10:05:40,959 INFO [streaming_decode.py:380] Cuts processed until now is 1250.
|
82 |
+
2023-06-21 10:05:41,199 INFO [streaming_decode.py:380] Cuts processed until now is 1300.
|
83 |
+
2023-06-21 10:05:41,448 INFO [streaming_decode.py:380] Cuts processed until now is 1350.
|
84 |
+
2023-06-21 10:05:41,697 INFO [streaming_decode.py:380] Cuts processed until now is 1400.
|
85 |
+
2023-06-21 10:05:41,938 INFO [streaming_decode.py:380] Cuts processed until now is 1450.
|
86 |
+
2023-06-21 10:05:51,050 INFO [streaming_decode.py:380] Cuts processed until now is 1500.
|
87 |
+
2023-06-21 10:05:53,941 INFO [streaming_decode.py:380] Cuts processed until now is 1550.
|
88 |
+
2023-06-21 10:05:55,569 INFO [streaming_decode.py:380] Cuts processed until now is 1600.
|
89 |
+
2023-06-21 10:05:55,799 INFO [streaming_decode.py:380] Cuts processed until now is 1650.
|
90 |
+
2023-06-21 10:05:57,493 INFO [streaming_decode.py:380] Cuts processed until now is 1700.
|
91 |
+
2023-06-21 10:05:57,735 INFO [streaming_decode.py:380] Cuts processed until now is 1750.
|
92 |
+
2023-06-21 10:05:57,961 INFO [streaming_decode.py:380] Cuts processed until now is 1800.
|
93 |
+
2023-06-21 10:05:59,694 INFO [streaming_decode.py:380] Cuts processed until now is 1850.
|
94 |
+
2023-06-21 10:05:59,923 INFO [streaming_decode.py:380] Cuts processed until now is 1900.
|
95 |
+
2023-06-21 10:06:00,151 INFO [streaming_decode.py:380] Cuts processed until now is 1950.
|
96 |
+
2023-06-21 10:06:01,771 INFO [streaming_decode.py:380] Cuts processed until now is 2000.
|
97 |
+
2023-06-21 10:06:01,997 INFO [streaming_decode.py:380] Cuts processed until now is 2050.
|
98 |
+
2023-06-21 10:06:02,241 INFO [streaming_decode.py:380] Cuts processed until now is 2100.
|
99 |
+
2023-06-21 10:06:02,465 INFO [streaming_decode.py:380] Cuts processed until now is 2150.
|
100 |
+
2023-06-21 10:06:04,249 INFO [streaming_decode.py:380] Cuts processed until now is 2200.
|
101 |
+
2023-06-21 10:06:04,478 INFO [streaming_decode.py:380] Cuts processed until now is 2250.
|
102 |
+
2023-06-21 10:06:04,710 INFO [streaming_decode.py:380] Cuts processed until now is 2300.
|
103 |
+
2023-06-21 10:06:06,461 INFO [streaming_decode.py:380] Cuts processed until now is 2350.
|
104 |
+
2023-06-21 10:06:06,697 INFO [streaming_decode.py:380] Cuts processed until now is 2400.
|
105 |
+
2023-06-21 10:06:06,931 INFO [streaming_decode.py:380] Cuts processed until now is 2450.
|
106 |
+
2023-06-21 10:06:08,726 INFO [streaming_decode.py:380] Cuts processed until now is 2500.
|
107 |
+
2023-06-21 10:06:08,950 INFO [streaming_decode.py:380] Cuts processed until now is 2550.
|
108 |
+
2023-06-21 10:06:09,187 INFO [streaming_decode.py:380] Cuts processed until now is 2600.
|
109 |
+
2023-06-21 10:06:10,940 INFO [streaming_decode.py:380] Cuts processed until now is 2650.
|
110 |
+
2023-06-21 10:06:11,165 INFO [streaming_decode.py:380] Cuts processed until now is 2700.
|
111 |
+
2023-06-21 10:06:12,942 INFO [streaming_decode.py:380] Cuts processed until now is 2750.
|
112 |
+
2023-06-21 10:06:13,183 INFO [streaming_decode.py:380] Cuts processed until now is 2800.
|
113 |
+
2023-06-21 10:06:14,919 INFO [streaming_decode.py:380] Cuts processed until now is 2850.
|
114 |
+
2023-06-21 10:06:16,667 INFO [streaming_decode.py:380] Cuts processed until now is 2900.
|
115 |
+
2023-06-21 10:06:18,270 INFO [streaming_decode.py:380] Cuts processed until now is 2950.
|
116 |
+
2023-06-21 10:06:19,990 INFO [streaming_decode.py:380] Cuts processed until now is 3000.
|
117 |
+
2023-06-21 10:06:20,222 INFO [streaming_decode.py:380] Cuts processed until now is 3050.
|
118 |
+
2023-06-21 10:06:21,952 INFO [streaming_decode.py:380] Cuts processed until now is 3100.
|
119 |
+
2023-06-21 10:06:22,202 INFO [streaming_decode.py:380] Cuts processed until now is 3150.
|
120 |
+
2023-06-21 10:06:23,959 INFO [streaming_decode.py:380] Cuts processed until now is 3200.
|
121 |
+
2023-06-21 10:06:24,183 INFO [streaming_decode.py:380] Cuts processed until now is 3250.
|
122 |
+
2023-06-21 10:06:25,951 INFO [streaming_decode.py:380] Cuts processed until now is 3300.
|
123 |
+
2023-06-21 10:06:26,203 INFO [streaming_decode.py:380] Cuts processed until now is 3350.
|
124 |
+
2023-06-21 10:06:27,984 INFO [streaming_decode.py:380] Cuts processed until now is 3400.
|
125 |
+
2023-06-21 10:06:28,228 INFO [streaming_decode.py:380] Cuts processed until now is 3450.
|
126 |
+
2023-06-21 10:06:28,468 INFO [streaming_decode.py:380] Cuts processed until now is 3500.
|
127 |
+
2023-06-21 10:06:30,266 INFO [streaming_decode.py:380] Cuts processed until now is 3550.
|
128 |
+
2023-06-21 10:06:30,497 INFO [streaming_decode.py:380] Cuts processed until now is 3600.
|
129 |
+
2023-06-21 10:06:45,693 INFO [streaming_decode.py:425] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
|
130 |
+
2023-06-21 10:06:45,825 INFO [utils.py:561] [test-commonvoice-beam_4_max_contexts_4_max_states_32] %WER 14.96% [19859 / 132787, 3004 ins, 8788 del, 8067 sub ]
|
131 |
+
2023-06-21 10:06:46,126 INFO [streaming_decode.py:436] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
|
132 |
+
2023-06-21 10:06:46,126 INFO [streaming_decode.py:450]
|
133 |
+
For test-commonvoice, WER of different settings are:
|
134 |
+
beam_4_max_contexts_4_max_states_32 14.96 best for test-commonvoice
|
135 |
+
|
136 |
+
2023-06-21 10:06:46,127 INFO [streaming_decode.py:618] Done!
|