Mozilla/whisperfile · Issue: Incomplete transcription

Aug 23

Inference run from web ui with default settings
Chosen audio file: raven_poe_64kb.wav

Inference result text field from the beginning, by model (cant attach full response json files, HF doesn't allow .json attachments):
tiny.en:
This is a Libra Fox recording. All Libra Fox recordings are\n in the public domain. For more information, please visit\n Librabox.org.\n Today's reading The Raven by Edgar Allan Poe, read by Chris\n Scoring.\n Once upon a midnight dewerely, ...
small.en:
Once upon a midnight dreary, ...
medium.en:
Once upon a midnight dreary, ...

Small and medium are missing the text:
This is a Libra Fox recording. All Libra Fox recordings are\n in the public domain. For more information, please visit\n Librabox.org.\n Today's reading The Raven by Edgar Allan Poe, read by Chris\n Scoring.\n

Windows 10 Pro
Version 22H2
OS build 19045.4170
Cuda: 11.8
Cuda added to env
Cuda bin added to path

All files downloaded from this repo.
Whisperfiles renamed to add .exe
Whisperfiles are on non system drive/directory
Whisperfiles run via windows shortcut, same args for all tested versions: --port 55556 --gpu nvidia

Cmd output for whisper-tiny.en.llamafile.exe --port 55556 --gpu nvidia:

whisper_init_from_file_with_params_no_state: loading model from '/zip/ggml-tiny.en.bin'
import_cuda_impl: initializing gpu module...
link_cuda_dso: note: dynamically linking /C/Users/Admin/.llamafile/v/0.8.13/ggml-cuda.dll
ggml_cuda_link: welcome to CUDA SDK with tinyBLAS
link_cuda_dso: GPU support loaded
whisper_init_with_params_no_state: cuda gpu = 1
whisper_init_with_params_no_state: metal gpu = 0
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 1 (tiny)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2070, compute capability 7.5, VMM: yes
whisper_model_load: CUDA0 total size = 77.11 MB
whisper_model_load: model size = 77.11 MB
whisper_backend_init_gpu: using CUDA backend
whisper_init_state: kv self size = 9.44 MB
whisper_init_state: kv cross size = 9.44 MB
whisper_init_state: kv pad size = 2.36 MB
whisper_init_state: compute buffer (conv) = 13.45 MB
whisper_init_state: compute buffer (encode) = 85.79 MB
whisper_init_state: compute buffer (cross) = 4.14 MB
whisper_init_state: compute buffer (decode) = 98.22 MB

whisper server listening at http://127.0.0.1:55556

Received request: raven_poe_64kb.wav
Successfully loaded raven_poe_64kb.wav

operator(): processing 'raven_poe_64kb.wav' (9135752 samples, 571.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

Running whisper.cpp inference on raven_poe_64kb.wav

Cmd output for whisper-medium.en.llamafile.exe --port 55556 --gpu nvidia:

whisper_init_from_file_with_params_no_state: loading model from '/zip/ggml-medium.en.bin'
import_cuda_impl: initializing gpu module...
link_cuda_dso: note: dynamically linking /C/Users/Admin/.llamafile/v/0.8.13/ggml-cuda.dll
ggml_cuda_link: welcome to CUDA SDK with tinyBLAS
link_cuda_dso: GPU support loaded
whisper_init_with_params_no_state: cuda gpu = 1
whisper_init_with_params_no_state: metal gpu = 0
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2070, compute capability 7.5, VMM: yes
whisper_model_load: CUDA0 total size = 1533.14 MB
whisper_model_load: model size = 1533.14 MB
whisper_backend_init_gpu: using CUDA backend
whisper_init_state: kv self size = 150.99 MB
whisper_init_state: kv cross size = 150.99 MB
whisper_init_state: kv pad size = 6.29 MB
whisper_init_state: compute buffer (conv) = 28.81 MB
whisper_init_state: compute buffer (encode) = 594.35 MB
whisper_init_state: compute buffer (cross) = 7.98 MB
whisper_init_state: compute buffer (decode) = 144.97 MB

whisper server listening at http://127.0.0.1:55556

Received request: raven_poe_64kb.wav
Successfully loaded raven_poe_64kb.wav

operator(): processing 'raven_poe_64kb.wav' (9135752 samples, 571.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

Running whisper.cpp inference on raven_poe_64kb.wav

IAmTheCollector

Aug 24

I was able to recreate the aforementioned behavior by using the models directly with whipser.cpp.
As it has nothing to do with Whisperfile I am closing the "Issue".

IAmTheCollector changed discussion status to closed Aug 24