Spaces:
Running
New Whisper implementation optimized for speaker diarization
This looks interesting via @philschmid :
“ If you are using Whisper for transcription, listen⁉️👂We created an optimized Whisper with Speaker Diarization for @huggingface Inference Endpoints 🤗 We created a reference implementation that optimizes Whisper with Flash Attention and Speculative Decoding and combines it with Diarization for speaker separations! 🤯
TL;DR:
🏎️ Ultra faster inference due to flash attention & speculative decoding
✅ Leverages the Custom Handler feature of Hugging Face Inference Endpoints
⚡️Takes 4.15s to transcribe 60s audio for Whisper Large on 1x A10G GPU
🔬 Combines Whisper with Pyannote's diarization model
🌐 Fully customizable and adjustable to specific use cases
🔓 Open-source for easy deployment”
Blog post: https://huggingface.co/blog/asr-diarization
Python code: https://huggingface.co/sergeipetrov/asrdiarization-handler/blob/main/handler.py
I immediately thought about potential use cases in journalism when I read this! Would be curious to know if someone tried it!