w11wo commited on Jun 21, 2023

Commit

9a835b2

•

1 Parent(s): b82f66a

Added Model

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +142 -0
data/lang_phone/L.pt +3 -0
data/lang_phone/L_disambig.pt +3 -0
data/lang_phone/Linv.pt +3 -0
data/lang_phone/lexicon.txt +32 -0
data/lang_phone/lexicon_disambig.txt +32 -0
data/lang_phone/tokens.txt +34 -0
data/lang_phone/words.txt +36 -0
exp/cpu_jit.pt +3 -0
exp/decoder_jit_trace-pnnx.pt +3 -0
exp/decoder_jit_trace.pt +3 -0
exp/encoder_jit_trace-pnnx.pt +3 -0
exp/encoder_jit_trace.pt +3 -0
exp/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
exp/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
exp/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
exp/fast_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model-2023-06-21-09-40-15 +45 -0
exp/fast_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
exp/fast_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
exp/fast_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +0 -0
exp/fast_beam_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
exp/fast_beam_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
exp/fast_beam_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt +2 -0
exp/greedy_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
exp/greedy_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
exp/greedy_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
exp/greedy_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model-2023-06-21-09-39-14 +39 -0
exp/greedy_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
exp/greedy_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
exp/greedy_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +0 -0
exp/greedy_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
exp/greedy_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
exp/greedy_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt +2 -0
exp/joiner_jit_trace-pnnx.pt +3 -0
exp/joiner_jit_trace.pt +3 -0
exp/modified_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
exp/modified_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
exp/modified_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
exp/modified_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model-2023-06-21-09-41-35 +55 -0
exp/modified_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
exp/modified_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
exp/modified_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +0 -0
exp/modified_beam_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
exp/modified_beam_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
exp/modified_beam_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt +2 -0
exp/pretrained.pt +3 -0
exp/streaming/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
exp/streaming/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
exp/streaming/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt +0 -0
exp/streaming/fast_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model-2023-06-21-10-04-38 +136 -0

README.md CHANGED Viewed

@@ -1,3 +1,145 @@
 ---
 license: apache-2.0
 ---

 ---
+language: id
 license: apache-2.0
+tags:
+  - icefall
+  - phoneme-recognition
+  - automatic-speech-recognition
+datasets:
+  - mozilla-foundation/common_voice_13_0
+  - indonesian-nlp/librivox-indonesia
+  - google/fleurs
 ---
+# Pruned Stateless Zipformer RNN-T Streaming ID
+Pruned Stateless Zipformer RNN-T Streaming ID is an automatic speech recognition model trained on the following datasets:
+- [Common Voice ID](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0)
+- [LibriVox Indonesia](https://huggingface.co/datasets/indonesian-nlp/librivox-indonesia)
+- [FLEURS ID](https://huggingface.co/datasets/google/fleurs)
+Instead of being trained to predict sequences of words, this model was trained to predict sequence of phonemes, e.g. `['p', 'ə', 'r', 'b', 'u', 'a', 't', 'a', 'n', 'ɲ', 'a']`. Therefore, the model's [vocabulary](https://huggingface.co/bookbot/pruned-transducer-stateless7-streaming-id/blob/main/data/lang_phone/tokens.txt) contains the different IPA phonemes found in [g2p ID](https://github.com/bookbot-kids/g2p_id).
+This model was trained using [icefall](https://github.com/k2-fsa/icefall) framework. All training was done on a Google Cloud Engine VM with a Tesla A100 GPU. All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/bookbot/pruned-transducer-stateless7-streaming-id/tree/main) tab, as well as the [Training metrics](https://huggingface.co/bookbot/pruned-transducer-stateless7-streaming-id/tensorboard) logged via Tensorboard.
+## Evaluation Results
+### Simulated Streaming
+```sh
+for m in greedy_search fast_beam_search modified_beam_search; do
+  ./pruned_transducer_stateless7_streaming/decode.py \
+    --epoch 30 \
+    --avg 9 \
+    --exp-dir ./pruned_transducer_stateless7_streaming/exp \
+    --max-duration 600 \
+    --decode-chunk-len 32 \
+    --decoding-method $m
+done
+```
+The model achieves the following phoneme error rates on the different test sets:
+| Decoding             | LibriVox | FLEURS | Common Voice |
+| -------------------- | :------: | :----: | :----------: |
+| Greedy Search        |  4.87%   | 11.45% |    14.97%    |
+| Modified Beam Search |  4.71%   | 11.25% |    14.31%    |
+| Fast Beam Search     |  4.85%   | 12.55% |    14.89%    |
+### Chunk-wise Streaming
+```sh
+for m in greedy_search fast_beam_search modified_beam_search; do
+  ./pruned_transducer_stateless7_streaming/streaming_decode.py \
+    --epoch 30 \
+    --avg 9 \
+    --exp-dir ./pruned_transducer_stateless7_streaming/exp \
+    --decoding-method $m \
+    --decode-chunk-len 32 \
+    --num-decode-streams 1500
+done
+```
+The model achieves the following phoneme error rates on the different test sets:
+| Decoding             | LibriVox | FLEURS | Common Voice |
+| -------------------- | :------: | :----: | :----------: |
+| Greedy Search        |  5.12%   | 12.74% |    15.78%    |
+| Modified Beam Search |  4.78%   | 11.83% |    14.54%    |
+| Fast Beam Search     |  4.81%   | 12.93% |    14.96%    |
+## Usage
+### Download Pre-trained Model
+```sh
+cd egs/bookbot/ASR
+mkdir tmp
+cd tmp
+git lfs install
+git clone https://huggingface.co/bookbot/pruned-transducer-stateless7-streaming-id
+```
+### Inference
+To decode with greedy search, run:
+```sh
+./pruned_transducer_stateless7_streaming/jit_pretrained.py \
+  --nn-model-filename ./tmp/pruned-transducer-stateless7-streaming-id/exp/cpu_jit.pt \
+  --lang-dir ./tmp/pruned-transducer-stateless7-streaming-id/data/lang_phone \
+  ./tmp/pruned-transducer-stateless7-streaming-id/test_waves/sample1.wav
+```
+<details>
+<summary>Decoding Output</summary>
+```
+2023-06-21 10:19:18,563 INFO [jit_pretrained.py:217] device: cpu
+2023-06-21 10:19:19,231 INFO [lexicon.py:168] Loading pre-compiled tmp/pruned-transducer-stateless7-streaming-id/data/lang_phone/Linv.pt
+2023-06-21 10:19:19,232 INFO [jit_pretrained.py:228] Constructing Fbank computer
+2023-06-21 10:19:19,233 INFO [jit_pretrained.py:238] Reading sound files: ['./tmp/pruned-transducer-stateless7-streaming-id/test_waves/sample1.wav']
+2023-06-21 10:19:19,234 INFO [jit_pretrained.py:244] Decoding started
+2023-06-21 10:19:20,090 INFO [jit_pretrained.py:271]
+./tmp/pruned-transducer-stateless7-streaming-id/test_waves/sample1.wav:
+p u l a ŋ | s ə k o l a h | p i t ə r i | s a ŋ a t | l a p a r
+2023-06-21 10:19:20,090 INFO [jit_pretrained.py:273] Decoding Done
+```
+</details>
+## Training procedure
+### Install icefall
+```sh
+git clone https://github.com/bookbot-hive/icefall
+cd icefall
+export PYTHONPATH=`pwd`:$PYTHONPATH
+```
+### Prepare Data
+```sh
+cd egs/bookbot_id/ASR
+./prepare.sh
+```
+### Train
+```sh
+export CUDA_VISIBLE_DEVICES="0"
+./pruned_transducer_stateless7_streaming/train.py \
+  --num-epochs 30 \
+  --use-fp16 1 \
+  --max-duration 400
+```
+## Frameworks
+- [k2](https://github.com/k2-fsa/k2)
+- [icefall](https://github.com/bookbot-hive/icefall)
+- [lhotse](https://github.com/bookbot-hive/lhotse)

data/lang_phone/L.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5e67299c15c8faa128dd7317d652619b51f28b431cec64fd3b8338daf9762fc4
+size 1551

data/lang_phone/L_disambig.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:42d1a58e242b3f7799fffda803fa17ada3112ae71be2556665c910051d25a7d7
+size 1715

data/lang_phone/Linv.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88935261f84d15c344a6adc9ac289b6d58acd18085a6900d5e5124866b5dc0ee
+size 1627

data/lang_phone/lexicon.txt ADDED Viewed

	@@ -0,0 +1,32 @@

+a  a
+b  b
+d  d
+dʒ  dʒ
+e  e
+f  f
+h  h
+i  i
+j  j
+k  k
+l  l
+m  m
+n  n
+o  o
+p  p
+r  r
+s  s
+t  t
+tʃ  tʃ
+u  u
+v  v
+w  w
+x  x
+z  z
+|  |
+ŋ  ŋ
+ə  ə
+ɡ  ɡ
+ɲ  ɲ
+ʃ  ʃ
+ʔ  ʔ
+<UNK>  <UNK>

data/lang_phone/lexicon_disambig.txt ADDED Viewed

	@@ -0,0 +1,32 @@

+a a
+b b
+d d
+dʒ dʒ
+e e
+f f
+h h
+i i
+j j
+k k
+l l
+m m
+n n
+o o
+p p
+r r
+s s
+t t
+tʃ tʃ
+u u
+v v
+w w
+x x
+z z
+| |
+ŋ ŋ
+ə ə
+ɡ ɡ
+ɲ ɲ
+ʃ ʃ
+ʔ ʔ
+<UNK> <UNK>

data/lang_phone/tokens.txt ADDED Viewed

	@@ -0,0 +1,34 @@

+<eps> 0
+ɡ 1
+o 2
+d 3
+ʃ 4
+v 5
+t 6
+<UNK> 7
+x 8
+r 9
+ʔ 10
+b 11
+s 12
+p 13
+i 14
+dʒ 15
+| 16
+ə 17
+z 18
+f 19
+n 20
+m 21
+ɲ 22
+tʃ 23
+ŋ 24
+k 25
+j 26
+l 27
+h 28
+w 29
+a 30
+u 31
+e 32
+#0 33

data/lang_phone/words.txt ADDED Viewed

	@@ -0,0 +1,36 @@

+<eps> 0
+<UNK> 1
+a 2
+b 3
+d 4
+dʒ 5
+e 6
+f 7
+h 8
+i 9
+j 10
+k 11
+l 12
+m 13
+n 14
+o 15
+p 16
+r 17
+s 18
+t 19
+tʃ 20
+u 21
+v 22
+w 23
+x 24
+z 25
+| 26
+ŋ 27
+ə 28
+ɡ 29
+ɲ 30
+ʃ 31
+ʔ 32
+#0 33
+<s> 34
+</s> 35

exp/cpu_jit.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1584f55881aead89f3bdd8d7dab007479a61e5cbf4eff83a4b95a68eba2b9160
+size 354961726

exp/decoder_jit_trace-pnnx.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:94e7e3bb9002ab8808c9d194a0cea7bb8bf1526f6ca0d8dcf9dcfd52229e4709
+size 89773

exp/decoder_jit_trace.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1a924947cdac6dd4d74cea0d5976637ed57c01950c543ba77f9417d3e5f35e23
+size 89590

exp/encoder_jit_trace-pnnx.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f3e371e7c9fdfb44343e037fbfe7e4e1404a3d8e421ac17ddacbb58e3983a9d
+size 278155657

exp/encoder_jit_trace.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d25b36b251e67544a850505a1655f3e26e1f309e43bc51f5ee10a7c510125ed7
+size 354193226

exp/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/fast_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model-2023-06-21-09-40-15 ADDED Viewed

	@@ -0,0 +1,45 @@

+2023-06-21 09:40:15,150 INFO [decode.py:654] Decoding started
+2023-06-21 09:40:15,151 INFO [decode.py:660] Device: cuda:0
+2023-06-21 09:40:15,152 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
+2023-06-21 09:40:15,155 INFO [decode.py:668] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9426c9f730820d291f5dcb06be337662595fa7b4', 'k2-git-date': 'Sun Feb 5 17:35:01 2023', 'lhotse-version': '1.15.0.dev+git.00d3e36.clean', 'torch-version': '1.13.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'd3f5d01-dirty', 'icefall-git-date': 'Wed May 31 04:15:45 2023', 'icefall-path': '/root/icefall', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/root/lhotse/lhotse/__init__.py', 'hostname': 'bookbot-k2', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 9, 'use_averaged_model': True, 'exp_dir': PosixPath('pruned_transducer_stateless7_streaming/exp'), 'lang_dir': 'data/lang_phone', 'decoding_method': 'fast_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('pruned_transducer_stateless7_streaming/exp/fast_beam_search'), 'suffix': 'epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model', 'blank_id': 0, 'unk_id': 7, 'vocab_size': 33}
+2023-06-21 09:40:15,155 INFO [decode.py:670] About to create model
+2023-06-21 09:40:15,733 INFO [zipformer.py:405] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
+2023-06-21 09:40:15,737 INFO [decode.py:741] Calculating the averaged model over epoch range from 21 (excluded) to 30
+2023-06-21 09:40:19,291 INFO [decode.py:774] Number of model parameters: 69471350
+2023-06-21 09:40:19,291 INFO [multidataset.py:122] About to get LibriVox test cuts
+2023-06-21 09:40:19,291 INFO [multidataset.py:124] Loading LibriVox in lazy mode
+2023-06-21 09:40:19,292 INFO [multidataset.py:133] About to get FLEURS test cuts
+2023-06-21 09:40:19,292 INFO [multidataset.py:135] Loading FLEURS in lazy mode
+2023-06-21 09:40:19,292 INFO [multidataset.py:144] About to get Common Voice test cuts
+2023-06-21 09:40:19,292 INFO [multidataset.py:146] Loading Common Voice in lazy mode
+2023-06-21 09:40:22,208 INFO [decode.py:565] batch 0/?, cuts processed until now is 44
+2023-06-21 09:40:28,732 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/fast_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
+2023-06-21 09:40:28,779 INFO [utils.py:561] [test-librivox-beam_20.0_max_contexts_8_max_states_64] %WER 4.85% [1773 / 36594, 295 ins, 904 del, 574 sub ]
+2023-06-21 09:40:28,860 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
+2023-06-21 09:40:28,860 INFO [decode.py:604]
+For test-librivox, WER of different settings are:
+beam_20.0_max_contexts_8_max_states_64	4.85	best for test-librivox
+2023-06-21 09:40:30,839 INFO [decode.py:565] batch 0/?, cuts processed until now is 38
+2023-06-21 09:41:00,055 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/fast_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
+2023-06-21 09:41:00,146 INFO [utils.py:561] [test-fleurs-beam_20.0_max_contexts_8_max_states_64] %WER 12.55% [11748 / 93580, 1672 ins, 5414 del, 4662 sub ]
+2023-06-21 09:41:00,362 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
+2023-06-21 09:41:00,362 INFO [decode.py:604]
+For test-fleurs, WER of different settings are:
+beam_20.0_max_contexts_8_max_states_64	12.55	best for test-fleurs
+2023-06-21 09:41:01,414 INFO [zipformer.py:2441] attn_weights_entropy = tensor([1.1632, 1.0353, 1.2741, 0.9735, 1.1847, 1.2830, 1.1450, 1.0967],
+       device='cuda:0'), covar=tensor([0.0547, 0.0601, 0.0483, 0.0755, 0.0373, 0.0368, 0.0490, 0.0569],
+       device='cuda:0'), in_proj_covar=tensor([0.0018, 0.0019, 0.0019, 0.0021, 0.0018, 0.0017, 0.0019, 0.0019],
+       device='cuda:0'), out_proj_covar=tensor([1.3702e-05, 1.4294e-05, 1.3432e-05, 1.4389e-05, 1.2265e-05, 1.4168e-05,
+        1.2323e-05, 1.3747e-05], device='cuda:0')
+2023-06-21 09:41:02,049 INFO [decode.py:565] batch 0/?, cuts processed until now is 121
+2023-06-21 09:41:22,562 INFO [decode.py:565] batch 20/?, cuts processed until now is 2809
+2023-06-21 09:41:31,340 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/fast_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
+2023-06-21 09:41:31,464 INFO [utils.py:561] [test-commonvoice-beam_20.0_max_contexts_8_max_states_64] %WER 14.89% [19770 / 132787, 2851 ins, 9210 del, 7709 sub ]
+2023-06-21 09:41:31,757 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt
+2023-06-21 09:41:31,757 INFO [decode.py:604]
+For test-commonvoice, WER of different settings are:
+beam_20.0_max_contexts_8_max_states_64	14.89	best for test-commonvoice
+2023-06-21 09:41:31,758 INFO [decode.py:809] Done!

exp/fast_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/fast_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/fast_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/fast_beam_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ settings WER
2	+ beam_20.0_max_contexts_8_max_states_64 14.89

exp/fast_beam_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ settings WER
2	+ beam_20.0_max_contexts_8_max_states_64 12.55

exp/fast_beam_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64-use-averaged-model.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ settings WER
2	+ beam_20.0_max_contexts_8_max_states_64 4.85

exp/greedy_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/greedy_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/greedy_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/greedy_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model-2023-06-21-09-39-14 ADDED Viewed

	@@ -0,0 +1,39 @@

+2023-06-21 09:39:14,130 INFO [decode.py:654] Decoding started
+2023-06-21 09:39:14,130 INFO [decode.py:660] Device: cuda:0
+2023-06-21 09:39:14,131 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
+2023-06-21 09:39:14,134 INFO [decode.py:668] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9426c9f730820d291f5dcb06be337662595fa7b4', 'k2-git-date': 'Sun Feb 5 17:35:01 2023', 'lhotse-version': '1.15.0.dev+git.00d3e36.clean', 'torch-version': '1.13.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'd3f5d01-dirty', 'icefall-git-date': 'Wed May 31 04:15:45 2023', 'icefall-path': '/root/icefall', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/root/lhotse/lhotse/__init__.py', 'hostname': 'bookbot-k2', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 9, 'use_averaged_model': True, 'exp_dir': PosixPath('pruned_transducer_stateless7_streaming/exp'), 'lang_dir': 'data/lang_phone', 'decoding_method': 'greedy_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('pruned_transducer_stateless7_streaming/exp/greedy_search'), 'suffix': 'epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model', 'blank_id': 0, 'unk_id': 7, 'vocab_size': 33}
+2023-06-21 09:39:14,135 INFO [decode.py:670] About to create model
+2023-06-21 09:39:14,915 INFO [zipformer.py:405] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
+2023-06-21 09:39:14,921 INFO [decode.py:741] Calculating the averaged model over epoch range from 21 (excluded) to 30
+2023-06-21 09:39:20,667 INFO [decode.py:774] Number of model parameters: 69471350
+2023-06-21 09:39:20,668 INFO [multidataset.py:122] About to get LibriVox test cuts
+2023-06-21 09:39:20,668 INFO [multidataset.py:124] Loading LibriVox in lazy mode
+2023-06-21 09:39:20,671 INFO [multidataset.py:133] About to get FLEURS test cuts
+2023-06-21 09:39:20,671 INFO [multidataset.py:135] Loading FLEURS in lazy mode
+2023-06-21 09:39:20,673 INFO [multidataset.py:144] About to get Common Voice test cuts
+2023-06-21 09:39:20,673 INFO [multidataset.py:146] Loading Common Voice in lazy mode
+2023-06-21 09:39:24,965 INFO [decode.py:565] batch 0/?, cuts processed until now is 44
+2023-06-21 09:39:29,616 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/greedy_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
+2023-06-21 09:39:29,662 INFO [utils.py:561] [test-librivox-greedy_search] %WER 4.87% [1783 / 36594, 317 ins, 868 del, 598 sub ]
+2023-06-21 09:39:29,742 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/greedy_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
+2023-06-21 09:39:29,742 INFO [decode.py:604]
+For test-librivox, WER of different settings are:
+greedy_search	4.87	best for test-librivox
+2023-06-21 09:39:31,511 INFO [decode.py:565] batch 0/?, cuts processed until now is 38
+2023-06-21 09:39:50,011 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/greedy_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
+2023-06-21 09:39:50,138 INFO [utils.py:561] [test-fleurs-greedy_search] %WER 11.45% [10718 / 93580, 1850 ins, 3733 del, 5135 sub ]
+2023-06-21 09:39:50,453 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/greedy_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
+2023-06-21 09:39:50,453 INFO [decode.py:604]
+For test-fleurs, WER of different settings are:
+greedy_search	11.45	best for test-fleurs
+2023-06-21 09:39:52,522 INFO [decode.py:565] batch 0/?, cuts processed until now is 121
+2023-06-21 09:40:11,369 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/greedy_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
+2023-06-21 09:40:11,489 INFO [utils.py:561] [test-commonvoice-greedy_search] %WER 14.97% [19873 / 132787, 3792 ins, 7589 del, 8492 sub ]
+2023-06-21 09:40:11,787 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/greedy_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt
+2023-06-21 09:40:11,788 INFO [decode.py:604]
+For test-commonvoice, WER of different settings are:
+greedy_search	14.97	best for test-commonvoice
+2023-06-21 09:40:11,788 INFO [decode.py:809] Done!

exp/greedy_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/greedy_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/greedy_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/greedy_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ settings WER
2	+ greedy_search 14.97

exp/greedy_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ settings WER
2	+ greedy_search 11.45

exp/greedy_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-context-2-max-sym-per-frame-1-use-averaged-model.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ settings WER
2	+ greedy_search 4.87

exp/joiner_jit_trace-pnnx.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b2772b338d03c7ebea5247337cf50fffb91a7950c351622a320ad4fc38b393ec
+size 1914564

exp/joiner_jit_trace.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f1ffb093a638ecdd5a015aff5d9c6ae62a7dddc815e18dd46ca19a46976367ce
+size 1914479

exp/modified_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/modified_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/modified_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/modified_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model-2023-06-21-09-41-35 ADDED Viewed

	@@ -0,0 +1,55 @@

+2023-06-21 09:41:35,276 INFO [decode.py:654] Decoding started
+2023-06-21 09:41:35,276 INFO [decode.py:660] Device: cuda:0
+2023-06-21 09:41:35,277 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
+2023-06-21 09:41:35,280 INFO [decode.py:668] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9426c9f730820d291f5dcb06be337662595fa7b4', 'k2-git-date': 'Sun Feb 5 17:35:01 2023', 'lhotse-version': '1.15.0.dev+git.00d3e36.clean', 'torch-version': '1.13.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'd3f5d01-dirty', 'icefall-git-date': 'Wed May 31 04:15:45 2023', 'icefall-path': '/root/icefall', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/root/lhotse/lhotse/__init__.py', 'hostname': 'bookbot-k2', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 9, 'use_averaged_model': True, 'exp_dir': PosixPath('pruned_transducer_stateless7_streaming/exp'), 'lang_dir': 'data/lang_phone', 'decoding_method': 'modified_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('pruned_transducer_stateless7_streaming/exp/modified_beam_search'), 'suffix': 'epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model', 'blank_id': 0, 'unk_id': 7, 'vocab_size': 33}
+2023-06-21 09:41:35,281 INFO [decode.py:670] About to create model
+2023-06-21 09:41:35,838 INFO [zipformer.py:405] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
+2023-06-21 09:41:35,843 INFO [decode.py:741] Calculating the averaged model over epoch range from 21 (excluded) to 30
+2023-06-21 09:41:39,380 INFO [decode.py:774] Number of model parameters: 69471350
+2023-06-21 09:41:39,380 INFO [multidataset.py:122] About to get LibriVox test cuts
+2023-06-21 09:41:39,380 INFO [multidataset.py:124] Loading LibriVox in lazy mode
+2023-06-21 09:41:39,381 INFO [multidataset.py:133] About to get FLEURS test cuts
+2023-06-21 09:41:39,381 INFO [multidataset.py:135] Loading FLEURS in lazy mode
+2023-06-21 09:41:39,381 INFO [multidataset.py:144] About to get Common Voice test cuts
+2023-06-21 09:41:39,381 INFO [multidataset.py:146] Loading Common Voice in lazy mode
+2023-06-21 09:41:43,886 INFO [decode.py:565] batch 0/?, cuts processed until now is 44
+2023-06-21 09:41:46,269 INFO [zipformer.py:2441] attn_weights_entropy = tensor([1.3801, 1.7156, 1.0930, 1.5632, 1.3604, 1.3437, 1.7393, 0.6970],
+       device='cuda:0'), covar=tensor([0.4497, 0.2012, 0.2669, 0.2689, 0.2707, 0.2909, 0.1440, 0.5122],
+       device='cuda:0'), in_proj_covar=tensor([0.0074, 0.0053, 0.0059, 0.0067, 0.0065, 0.0064, 0.0051, 0.0077],
+       device='cuda:0'), out_proj_covar=tensor([5.5637e-05, 3.5992e-05, 4.1115e-05, 4.8266e-05, 4.8700e-05, 4.4501e-05,
+        3.4417e-05, 7.3250e-05], device='cuda:0')
+2023-06-21 09:42:00,403 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/modified_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
+2023-06-21 09:42:00,449 INFO [utils.py:561] [test-librivox-beam_size_4] %WER 4.71% [1725 / 36594, 309 ins, 836 del, 580 sub ]
+2023-06-21 09:42:00,531 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/modified_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
+2023-06-21 09:42:00,531 INFO [decode.py:604]
+For test-librivox, WER of different settings are:
+beam_size_4	4.71	best for test-librivox
+2023-06-21 09:42:01,464 INFO [zipformer.py:2441] attn_weights_entropy = tensor([2.1911, 1.2934, 2.0949, 2.2245, 2.1813, 2.1569, 1.7841, 1.7188],
+       device='cuda:0'), covar=tensor([0.1696, 0.4060, 0.1661, 0.1975, 0.1970, 0.2132, 0.1748, 0.3224],
+       device='cuda:0'), in_proj_covar=tensor([0.0029, 0.0040, 0.0028, 0.0028, 0.0029, 0.0030, 0.0027, 0.0034],
+       device='cuda:0'), out_proj_covar=tensor([1.8266e-05, 3.2097e-05, 1.7461e-05, 1.6755e-05, 1.8651e-05, 1.9838e-05,
+        1.5794e-05, 2.3433e-05], device='cuda:0')
+2023-06-21 09:42:04,999 INFO [decode.py:565] batch 0/?, cuts processed until now is 38
+2023-06-21 09:43:09,460 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/modified_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
+2023-06-21 09:43:09,552 INFO [utils.py:561] [test-fleurs-beam_size_4] %WER 11.25% [10525 / 93580, 1811 ins, 3811 del, 4903 sub ]
+2023-06-21 09:43:09,853 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/modified_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
+2023-06-21 09:43:09,853 INFO [decode.py:604]
+For test-fleurs, WER of different settings are:
+beam_size_4	11.25	best for test-fleurs
+2023-06-21 09:43:14,023 INFO [decode.py:565] batch 0/?, cuts processed until now is 121
+2023-06-21 09:43:47,394 INFO [zipformer.py:2441] attn_weights_entropy = tensor([2.5738, 2.5492, 3.0284, 2.4510, 1.3782, 3.0004, 2.8027, 1.4081],
+       device='cuda:0'), covar=tensor([0.1153, 0.1301, 0.0459, 0.0990, 0.4808, 0.0525, 0.0757, 0.4425],
+       device='cuda:0'), in_proj_covar=tensor([0.0071, 0.0071, 0.0055, 0.0070, 0.0106, 0.0057, 0.0058, 0.0105],
+       device='cuda:0'), out_proj_covar=tensor([5.9638e-05, 6.0235e-05, 4.2007e-05, 5.4275e-05, 1.0845e-04, 4.2491e-05,
+        4.5487e-05, 9.8369e-05], device='cuda:0')
+2023-06-21 09:44:30,935 INFO [decode.py:565] batch 20/?, cuts processed until now is 2809
+2023-06-21 09:44:57,467 INFO [decode.py:579] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/modified_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
+2023-06-21 09:44:57,589 INFO [utils.py:561] [test-commonvoice-beam_size_4] %WER 14.31% [19002 / 132787, 3318 ins, 7575 del, 8109 sub ]
+2023-06-21 09:44:57,887 INFO [decode.py:590] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/modified_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt
+2023-06-21 09:44:57,888 INFO [decode.py:604]
+For test-commonvoice, WER of different settings are:
+beam_size_4	14.31	best for test-commonvoice
+2023-06-21 09:44:57,888 INFO [decode.py:809] Done!

exp/modified_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/modified_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/modified_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/modified_beam_search/wer-summary-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ settings WER
2	+ beam_size_4 14.31

exp/modified_beam_search/wer-summary-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ settings WER
2	+ beam_size_4 11.25

exp/modified_beam_search/wer-summary-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-modified_beam_search-beam-size-4-use-averaged-model.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ settings WER
2	+ beam_size_4 4.71

exp/pretrained.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d7fb8734cd4c8edd2c360ad93343bbbb755b3195eb27e2871e37dc7be6293a4f
+size 278176561

exp/streaming/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/streaming/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/streaming/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

exp/streaming/fast_beam_search/log-decode-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model-2023-06-21-10-04-38 ADDED Viewed

	@@ -0,0 +1,136 @@

+2023-06-21 10:04:38,023 INFO [streaming_decode.py:483] Decoding started
+2023-06-21 10:04:38,023 INFO [streaming_decode.py:489] Device: cuda:0
+2023-06-21 10:04:38,024 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
+2023-06-21 10:04:38,027 INFO [streaming_decode.py:497] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9426c9f730820d291f5dcb06be337662595fa7b4', 'k2-git-date': 'Sun Feb 5 17:35:01 2023', 'lhotse-version': '1.15.0.dev+git.00d3e36.clean', 'torch-version': '1.13.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': 'd3f5d01-dirty', 'icefall-git-date': 'Wed May 31 04:15:45 2023', 'icefall-path': '/root/icefall', 'k2-path': '/usr/local/lib/python3.10/dist-packages/k2/__init__.py', 'lhotse-path': '/root/lhotse/lhotse/__init__.py', 'hostname': 'bookbot-k2', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 9, 'use_averaged_model': True, 'exp_dir': PosixPath('pruned_transducer_stateless7_streaming/exp'), 'lang_dir': 'data/lang_phone', 'decoding_method': 'fast_beam_search', 'num_active_paths': 4, 'beam': 4, 'max_contexts': 4, 'max_states': 32, 'context_size': 2, 'num_decode_streams': 1500, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'res_dir': PosixPath('pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search'), 'suffix': 'epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model', 'blank_id': 0, 'unk_id': 7, 'vocab_size': 33}
+2023-06-21 10:04:38,027 INFO [streaming_decode.py:499] About to create model
+2023-06-21 10:04:38,604 INFO [zipformer.py:405] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8.
+2023-06-21 10:04:38,608 INFO [streaming_decode.py:566] Calculating the averaged model over epoch range from 21 (excluded) to 30
+2023-06-21 10:04:42,203 INFO [streaming_decode.py:588] Number of model parameters: 69471350
+2023-06-21 10:04:42,204 INFO [multidataset.py:122] About to get LibriVox test cuts
+2023-06-21 10:04:42,204 INFO [multidataset.py:124] Loading LibriVox in lazy mode
+2023-06-21 10:04:42,204 INFO [multidataset.py:133] About to get FLEURS test cuts
+2023-06-21 10:04:42,204 INFO [multidataset.py:135] Loading FLEURS in lazy mode
+2023-06-21 10:04:42,205 INFO [multidataset.py:144] About to get Common Voice test cuts
+2023-06-21 10:04:42,205 INFO [multidataset.py:146] Loading Common Voice in lazy mode
+2023-06-21 10:04:42,471 INFO [streaming_decode.py:380] Cuts processed until now is 0.
+2023-06-21 10:04:42,786 INFO [streaming_decode.py:380] Cuts processed until now is 50.
+2023-06-21 10:04:43,098 INFO [streaming_decode.py:380] Cuts processed until now is 100.
+2023-06-21 10:04:43,444 INFO [streaming_decode.py:380] Cuts processed until now is 150.
+2023-06-21 10:04:43,770 INFO [streaming_decode.py:380] Cuts processed until now is 200.
+2023-06-21 10:04:44,092 INFO [streaming_decode.py:380] Cuts processed until now is 250.
+2023-06-21 10:04:44,416 INFO [streaming_decode.py:380] Cuts processed until now is 300.
+2023-06-21 10:04:44,756 INFO [streaming_decode.py:380] Cuts processed until now is 350.
+2023-06-21 10:04:45,079 INFO [streaming_decode.py:380] Cuts processed until now is 400.
+2023-06-21 10:04:45,405 INFO [streaming_decode.py:380] Cuts processed until now is 450.
+2023-06-21 10:04:45,734 INFO [streaming_decode.py:380] Cuts processed until now is 500.
+2023-06-21 10:04:46,071 INFO [streaming_decode.py:380] Cuts processed until now is 550.
+2023-06-21 10:04:46,405 INFO [streaming_decode.py:380] Cuts processed until now is 600.
+2023-06-21 10:04:57,029 INFO [streaming_decode.py:425] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/recogs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
+2023-06-21 10:04:57,063 INFO [utils.py:561] [test-librivox-beam_4_max_contexts_4_max_states_32] %WER 4.81% [1759 / 36594, 280 ins, 892 del, 587 sub ]
+2023-06-21 10:04:57,144 INFO [streaming_decode.py:436] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/errs-test-librivox-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
+2023-06-21 10:04:57,145 INFO [streaming_decode.py:450]
+For test-librivox, WER of different settings are:
+beam_4_max_contexts_4_max_states_32	4.81	best for test-librivox
+2023-06-21 10:04:57,149 INFO [streaming_decode.py:380] Cuts processed until now is 0.
+2023-06-21 10:04:57,332 INFO [streaming_decode.py:380] Cuts processed until now is 50.
+2023-06-21 10:04:57,494 INFO [streaming_decode.py:380] Cuts processed until now is 100.
+2023-06-21 10:04:57,663 INFO [streaming_decode.py:380] Cuts processed until now is 150.
+2023-06-21 10:04:57,833 INFO [streaming_decode.py:380] Cuts processed until now is 200.
+2023-06-21 10:04:58,000 INFO [streaming_decode.py:380] Cuts processed until now is 250.
+2023-06-21 10:04:58,161 INFO [streaming_decode.py:380] Cuts processed until now is 300.
+2023-06-21 10:04:58,323 INFO [streaming_decode.py:380] Cuts processed until now is 350.
+2023-06-21 10:04:58,488 INFO [streaming_decode.py:380] Cuts processed until now is 400.
+2023-06-21 10:04:58,656 INFO [streaming_decode.py:380] Cuts processed until now is 450.
+2023-06-21 10:04:58,819 INFO [streaming_decode.py:380] Cuts processed until now is 500.
+2023-06-21 10:04:58,993 INFO [streaming_decode.py:380] Cuts processed until now is 550.
+2023-06-21 10:04:59,176 INFO [streaming_decode.py:380] Cuts processed until now is 600.
+2023-06-21 10:04:59,364 INFO [streaming_decode.py:380] Cuts processed until now is 650.
+2023-06-21 10:05:34,495 INFO [streaming_decode.py:425] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/recogs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
+2023-06-21 10:05:34,590 INFO [utils.py:561] [test-fleurs-beam_4_max_contexts_4_max_states_32] %WER 12.93% [12100 / 93580, 1706 ins, 5594 del, 4800 sub ]
+2023-06-21 10:05:34,813 INFO [streaming_decode.py:436] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/errs-test-fleurs-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
+2023-06-21 10:05:34,814 INFO [streaming_decode.py:450]
+For test-fleurs, WER of different settings are:
+beam_4_max_contexts_4_max_states_32	12.93	best for test-fleurs
+2023-06-21 10:05:34,820 INFO [streaming_decode.py:380] Cuts processed until now is 0.
+2023-06-21 10:05:35,059 INFO [streaming_decode.py:380] Cuts processed until now is 50.
+2023-06-21 10:05:35,308 INFO [streaming_decode.py:380] Cuts processed until now is 100.
+2023-06-21 10:05:35,583 INFO [streaming_decode.py:380] Cuts processed until now is 150.
+2023-06-21 10:05:35,829 INFO [streaming_decode.py:380] Cuts processed until now is 200.
+2023-06-21 10:05:36,082 INFO [streaming_decode.py:380] Cuts processed until now is 250.
+2023-06-21 10:05:36,315 INFO [streaming_decode.py:380] Cuts processed until now is 300.
+2023-06-21 10:05:36,537 INFO [streaming_decode.py:380] Cuts processed until now is 350.
+2023-06-21 10:05:36,797 INFO [streaming_decode.py:380] Cuts processed until now is 400.
+2023-06-21 10:05:37,028 INFO [streaming_decode.py:380] Cuts processed until now is 450.
+2023-06-21 10:05:37,263 INFO [streaming_decode.py:380] Cuts processed until now is 500.
+2023-06-21 10:05:37,499 INFO [streaming_decode.py:380] Cuts processed until now is 550.
+2023-06-21 10:05:37,720 INFO [streaming_decode.py:380] Cuts processed until now is 600.
+2023-06-21 10:05:37,959 INFO [streaming_decode.py:380] Cuts processed until now is 650.
+2023-06-21 10:05:38,182 INFO [streaming_decode.py:380] Cuts processed until now is 700.
+2023-06-21 10:05:38,406 INFO [streaming_decode.py:380] Cuts processed until now is 750.
+2023-06-21 10:05:38,664 INFO [streaming_decode.py:380] Cuts processed until now is 800.
+2023-06-21 10:05:38,913 INFO [streaming_decode.py:380] Cuts processed until now is 850.
+2023-06-21 10:05:39,251 INFO [streaming_decode.py:380] Cuts processed until now is 900.
+2023-06-21 10:05:39,493 INFO [streaming_decode.py:380] Cuts processed until now is 950.
+2023-06-21 10:05:39,726 INFO [streaming_decode.py:380] Cuts processed until now is 1000.
+2023-06-21 10:05:39,959 INFO [streaming_decode.py:380] Cuts processed until now is 1050.
+2023-06-21 10:05:40,192 INFO [streaming_decode.py:380] Cuts processed until now is 1100.
+2023-06-21 10:05:40,436 INFO [streaming_decode.py:380] Cuts processed until now is 1150.
+2023-06-21 10:05:40,709 INFO [streaming_decode.py:380] Cuts processed until now is 1200.
+2023-06-21 10:05:40,959 INFO [streaming_decode.py:380] Cuts processed until now is 1250.
+2023-06-21 10:05:41,199 INFO [streaming_decode.py:380] Cuts processed until now is 1300.
+2023-06-21 10:05:41,448 INFO [streaming_decode.py:380] Cuts processed until now is 1350.
+2023-06-21 10:05:41,697 INFO [streaming_decode.py:380] Cuts processed until now is 1400.
+2023-06-21 10:05:41,938 INFO [streaming_decode.py:380] Cuts processed until now is 1450.
+2023-06-21 10:05:51,050 INFO [streaming_decode.py:380] Cuts processed until now is 1500.
+2023-06-21 10:05:53,941 INFO [streaming_decode.py:380] Cuts processed until now is 1550.
+2023-06-21 10:05:55,569 INFO [streaming_decode.py:380] Cuts processed until now is 1600.
+2023-06-21 10:05:55,799 INFO [streaming_decode.py:380] Cuts processed until now is 1650.
+2023-06-21 10:05:57,493 INFO [streaming_decode.py:380] Cuts processed until now is 1700.
+2023-06-21 10:05:57,735 INFO [streaming_decode.py:380] Cuts processed until now is 1750.
+2023-06-21 10:05:57,961 INFO [streaming_decode.py:380] Cuts processed until now is 1800.
+2023-06-21 10:05:59,694 INFO [streaming_decode.py:380] Cuts processed until now is 1850.
+2023-06-21 10:05:59,923 INFO [streaming_decode.py:380] Cuts processed until now is 1900.
+2023-06-21 10:06:00,151 INFO [streaming_decode.py:380] Cuts processed until now is 1950.
+2023-06-21 10:06:01,771 INFO [streaming_decode.py:380] Cuts processed until now is 2000.
+2023-06-21 10:06:01,997 INFO [streaming_decode.py:380] Cuts processed until now is 2050.
+2023-06-21 10:06:02,241 INFO [streaming_decode.py:380] Cuts processed until now is 2100.
+2023-06-21 10:06:02,465 INFO [streaming_decode.py:380] Cuts processed until now is 2150.
+2023-06-21 10:06:04,249 INFO [streaming_decode.py:380] Cuts processed until now is 2200.
+2023-06-21 10:06:04,478 INFO [streaming_decode.py:380] Cuts processed until now is 2250.
+2023-06-21 10:06:04,710 INFO [streaming_decode.py:380] Cuts processed until now is 2300.
+2023-06-21 10:06:06,461 INFO [streaming_decode.py:380] Cuts processed until now is 2350.
+2023-06-21 10:06:06,697 INFO [streaming_decode.py:380] Cuts processed until now is 2400.
+2023-06-21 10:06:06,931 INFO [streaming_decode.py:380] Cuts processed until now is 2450.
+2023-06-21 10:06:08,726 INFO [streaming_decode.py:380] Cuts processed until now is 2500.
+2023-06-21 10:06:08,950 INFO [streaming_decode.py:380] Cuts processed until now is 2550.
+2023-06-21 10:06:09,187 INFO [streaming_decode.py:380] Cuts processed until now is 2600.
+2023-06-21 10:06:10,940 INFO [streaming_decode.py:380] Cuts processed until now is 2650.
+2023-06-21 10:06:11,165 INFO [streaming_decode.py:380] Cuts processed until now is 2700.
+2023-06-21 10:06:12,942 INFO [streaming_decode.py:380] Cuts processed until now is 2750.
+2023-06-21 10:06:13,183 INFO [streaming_decode.py:380] Cuts processed until now is 2800.
+2023-06-21 10:06:14,919 INFO [streaming_decode.py:380] Cuts processed until now is 2850.
+2023-06-21 10:06:16,667 INFO [streaming_decode.py:380] Cuts processed until now is 2900.
+2023-06-21 10:06:18,270 INFO [streaming_decode.py:380] Cuts processed until now is 2950.
+2023-06-21 10:06:19,990 INFO [streaming_decode.py:380] Cuts processed until now is 3000.
+2023-06-21 10:06:20,222 INFO [streaming_decode.py:380] Cuts processed until now is 3050.
+2023-06-21 10:06:21,952 INFO [streaming_decode.py:380] Cuts processed until now is 3100.
+2023-06-21 10:06:22,202 INFO [streaming_decode.py:380] Cuts processed until now is 3150.
+2023-06-21 10:06:23,959 INFO [streaming_decode.py:380] Cuts processed until now is 3200.
+2023-06-21 10:06:24,183 INFO [streaming_decode.py:380] Cuts processed until now is 3250.
+2023-06-21 10:06:25,951 INFO [streaming_decode.py:380] Cuts processed until now is 3300.
+2023-06-21 10:06:26,203 INFO [streaming_decode.py:380] Cuts processed until now is 3350.
+2023-06-21 10:06:27,984 INFO [streaming_decode.py:380] Cuts processed until now is 3400.
+2023-06-21 10:06:28,228 INFO [streaming_decode.py:380] Cuts processed until now is 3450.
+2023-06-21 10:06:28,468 INFO [streaming_decode.py:380] Cuts processed until now is 3500.
+2023-06-21 10:06:30,266 INFO [streaming_decode.py:380] Cuts processed until now is 3550.
+2023-06-21 10:06:30,497 INFO [streaming_decode.py:380] Cuts processed until now is 3600.
+2023-06-21 10:06:45,693 INFO [streaming_decode.py:425] The transcripts are stored in pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/recogs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
+2023-06-21 10:06:45,825 INFO [utils.py:561] [test-commonvoice-beam_4_max_contexts_4_max_states_32] %WER 14.96% [19859 / 132787, 3004 ins, 8788 del, 8067 sub ]
+2023-06-21 10:06:46,126 INFO [streaming_decode.py:436] Wrote detailed error stats to pruned_transducer_stateless7_streaming/exp/streaming/fast_beam_search/errs-test-commonvoice-epoch-30-avg-9-streaming-chunk-size-32-beam-4-max-contexts-4-max-states-32-use-averaged-model.txt
+2023-06-21 10:06:46,126 INFO [streaming_decode.py:450]
+For test-commonvoice, WER of different settings are:
+beam_4_max_contexts_4_max_states_32	14.96	best for test-commonvoice
+2023-06-21 10:06:46,127 INFO [streaming_decode.py:618] Done!