File size: 2,430 Bytes
a289868
 
 
 
 
 
 
 
 
a0d5b10
3e6bfb9
a0d5b10
 
 
 
3e6bfb9
66d96b7
13bb473
603b9b1
 
 
 
 
 
 
33c6fa0
603b9b1
 
 
9b6e5e1
603b9b1
3e6bfb9
93f0a44
291cbec
63446fc
5390c17
291cbec
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
title: README
emoji: 🚀
colorFrom: yellow
colorTo: green
sdk: static
pinned: false
---

[**pyannote.audio**](https://github.com/pyannote/pyannote-audio) is an open-source toolkit for speaker diarization. 

Pretrained pipelines reach state-of-the-art performance on most academic benchmarks.

Using it in production?   
Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.

| Benchmark              | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [pyannoteAI](https://www.pyannote.ai) |
| ---------------------- | ------ | ------ | --------- |
| [AISHELL-4](https://arxiv.org/abs/2104.03603)              |  14.1  |  12.2  | 11.2      |
| [AliMeeting](https://www.openslr.org/119/) (channel 1) |  27.4  |  24.4  | 19.3      |
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM)              |  18.9  |  18.8  | 15.8      |
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM)              |  27.1  |  22.4  | 19.3      |
| [AVA-AVD](https://arxiv.org/abs/2111.14448)                |  66.3  |  50.0  | 44.8      |
| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1))      |  31.6  |  28.4  | 19.8      |
| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477))        |  26.9  |  21.7  | 16.8      |
| [Earnings21](https://github.com/revdotcom/speech-datasets)   | 17.0 | 9.4 | 9.1 |
| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.)           |  61.5  |  51.2  | 44.0      |
| [MSDWild](https://github.com/X-LANCE/MSDWILD)                |  32.8  |  25.3  | 19.8      |
| [RAMC](https://www.openslr.org/123/)                   |  22.5  |  22.2  | 11.1      |
| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2)        |   8.2  |   7.8  |  7.6      |
| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3)     |  11.2  |  11.3  |  9.8      |
[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)

Using high-end NVIDIA hardware,
* [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) takes around 1m30s to process 1h of audio
* [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) takes around 1m20s to process 1h of audio
* On-premise [pyannoteAI](https://www.pyannote.ai) takes less than 30s to process 1h of audio