japanese-wav2vec2-base-rs35kh
This model is a wav2vec 2.0 Base fine-tuned on the large-scale Japanese ASR corpus ReazonSpeech v2.0.
Usage
You can use this model through transformers
library:
import librosa
import numpy as np
from transformers import AutoProcessor, Wav2Vec2ForCTC
model = Wav2Vec2ForCTC.from_pretrained(
"reazon-research/japanese-wav2vec2-base-rs35kh",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
).to("cuda")
processor = AutoProcessor.from_pretrained("reazon-research/japanese-wav2vec2-base-rs35kh")
audio, _ = librosa.load(audio_filepath, sr=16_000)
audio = np.pad(audio, pad_width=int(0.5 * 16_000)) # Recommend to pad audio before inference
input_values = processor(
audio,
return_tensors="pt",
sampling_rate=16_000
).input_values.to("cuda").to(torch.bfloat16)
with torch.inference_mode():
logits = model(input_values).logits.cpu()
predicted_ids = torch.argmax(logits, dim=-1)[0]
transcription = processor.decode(predicted_ids, skip_special_tokens=True)
Test Results
We report the Character Error Rate (CER) of our model and the other wav2vec2 families.
Model | #Prameters⬇ | AVERAGE⬇ | JSUT-BASIC5000⬇ | Common Voice⬇ | TEDxJP-10K⬇ |
---|---|---|---|---|---|
reazon-research/japanese-wav2vec2-base-rs35kh | 96.7M | 20.40% | 13.22% | 23.76% | 24.23% |
Ivydata/wav2vec2-large-xlsr-53-japanese | 318M | 24.23% | 13.83% | 18.15% | 40.72% |
jonatasgrosman/wav2vec2-large-xlsr-53-japanese | 317M | 31.82% | 4.25% | 40.58% | 50.63% |
vumichien/wav2vec2-large-xlsr-japanese | 318M | 39.87% | 4.21% | 53.29% | 62.12% |
We also report the CER for long-form speech.
Model | #Prameters⬇ | JSUT-BOOK⬇ |
---|---|---|
reazon-research/japanese-wav2vec2-base-rs35kh | 96.7M | 82.84% |
Ivydata/wav2vec2-large-xlsr-53-japanese | 318M | 65.60% |
jonatasgrosman/wav2vec2-large-xlsr-53-japanese | 317M | 46.20% |
vumichien/wav2vec2-large-xlsr-japanese | 318M | 46.52% |
Citation
@misc{reazon-research-japanese-wav2vec2-base-rs35kh,
title={japanese-wav2vec2-base-rs35kh},
author={Sasaki, Yuta},
url = {https://huggingface.co/reazon-research/japanese-wav2vec2-base-rs35kh},
year = {2024}
}
License
- Downloads last month
- 175
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.