Spaces:
Running
Running
File size: 4,682 Bytes
4334ff7 3df8e1c e5f24dc 4bcf269 e5f24dc ca6b54c e5f24dc 1aadf57 60aa1db 556fa8d 09f6ba2 e5f24dc 0148cf2 45e3c33 e5f24dc 45e3c33 de948f8 daf4d68 2eba6ab e5f24dc f4a1542 b23876a 0148cf2 ca09634 e5f24dc 1768ffc 9cb7c71 9f35f64 0685ed9 0148cf2 8ebf733 e5ccfcb 8ebf733 e5ccfcb 8ebf733 e5ccfcb 8ebf733 e5ccfcb 8ebf733 330bb57 556fa8d ebf321b 556fa8d 10a154c 556fa8d ebf321b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
# Whisper-WebUI
A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper). You can use it as an Easy Subtitle Generator!
![Whisper WebUI](https://github.com/jhj0517/Whsiper-WebUI/blob/master/screenshot.png)
## Notebook
If you wish to try this on Colab, you can do it in [here](https://colab.research.google.com/github/jhj0517/Whisper-WebUI/blob/master/notebook/whisper-webui.ipynb)!
# Feature
- Generate subtitles from various sources, including :
- Files
- Youtube
- Microphone
- Currently supported subtitle formats :
- SRT
- WebVTT
- txt ( only text file without timeline )
- Speech to Text Translation
- From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
- Text to Text Translation
- Translate subtitle files using Facebook NLLB models
- Translate subtitle files using DeepL API
# Installation and Running
- ## On Windows OS
### Prerequisite
To run this WebUI, you need to have `git`, `python` version 3.8 ~ 3.10, `CUDA` version above 12.0 and `FFmpeg`.
Please follow the links below to install the necessary software:
- CUDA : [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)
- git : [https://git-scm.com/downloads](https://git-scm.com/downloads)
- python : [https://www.python.org/downloads/](https://www.python.org/downloads/) **( If your python version is too new, torch will not install properly.)**
- FFmpeg : [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html)
After installing FFmpeg, **make sure to add the `FFmpeg/bin` folder to your system PATH!**
### Automatic Installation
If you have satisfied the prerequisites listed above, you are now ready to start Whisper-WebUI.
1. Run `Install.bat` from Windows Explorer as a regular, non-administrator user.
2. After installation, run the `start-webui.bat`.
3. Open your web browser and go to `http://localhost:7860`
( If you're running another Web-UI, it will be hosted on a different port , such as `localhost:7861`, `localhost:7862`, and so on )
And you can also run the project with command line arguments if you like by running `user-start-webui.bat`, see [wiki](https://github.com/jhj0517/Whisper-WebUI/wiki/Command-Line-Arguments) for a guide to arguments.
- ## Docker ( On Other OS )
1. Build the image
```sh
docker build -t whisper-webui:latest .
```
2. Run the container with commands
- For bash :
```sh
docker run --gpus all -d \
-v /path/to/models:/Whisper-WebUI/models \
-v /path/to/outputs:/Whisper-WebUI/outputs \
-p 7860:7860 \
-it \
whisper-webui:latest --server_name 0.0.0.0 --server_port 7860
```
- For PowerShell:
```shell
docker run --gpus all -d `
-v /path/to/models:/Whisper-WebUI/models `
-v /path/to/outputs:/Whisper-WebUI/outputs `
-p 7860:7860 `
-it `
whisper-webui:latest --server_name 0.0.0.0 --server_port 7860
```
# VRAM Usages
This project is integrated with [faster-whisper](https://github.com/guillaumekln/faster-whisper) by default for better VRAM usage and transcription speed.
According to faster-whisper, the efficiency of the optimized whisper model is as follows:
| Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory |
|-------------------|-----------|-----------|-------|-----------------|-----------------|
| openai/whisper | fp16 | 5 | 4m30s | 11325MB | 9439MB |
| faster-whisper | fp16 | 5 | 54s | 4755MB | 3244MB |
If you want to use the original Open AI whisper implementation instead of optimized whisper, you can set the command line argument `--disable_faster_whisper` to `True`. See the [wiki](https://github.com/jhj0517/Whisper-WebUI/wiki/Command-Line-Arguments) for more information.
## Available models
This is Whisper's original VRAM usage table for models.
| Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
| tiny | 39 M | `tiny.en` | `tiny` | ~1 GB | ~32x |
| base | 74 M | `base.en` | `base` | ~1 GB | ~16x |
| small | 244 M | `small.en` | `small` | ~2 GB | ~6x |
| medium | 769 M | `medium.en` | `medium` | ~5 GB | ~2x |
| large | 1550 M | N/A | `large` | ~10 GB | 1x |
`.en` models are for English only, and the cool thing is that you can use the `Translate to English` option from the "large" models!
|