File size: 2,788 Bytes
b319c9f
 
741d0b2
 
 
 
 
 
 
 
 
b319c9f
 
 
 
cd11712
b319c9f
9460198
 
67f1634
 
741d0b2
 
 
b319c9f
8800da8
 
 
0a51fa1
8800da8
b319c9f
0a51fa1
 
 
 
b319c9f
 
 
 
 
 
 
 
741d0b2
1e66a99
741d0b2
 
 
b319c9f
 
 
 
741d0b2
 
 
b319c9f
741d0b2
 
 
b319c9f
741d0b2
 
 
 
 
 
b319c9f
 
 
741d0b2
b319c9f
 
741d0b2
b319c9f
741d0b2
b319c9f
741d0b2
b319c9f
 
741d0b2
b319c9f
56653cd
741d0b2
7e25e6a
741d0b2
 
 
 
 
56653cd
b319c9f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
library_name: transformers
datasets:
- reazon-research/reazonspeech
- joujiboi/japanese-anime-speech
language:
- ja
- en
metrics:
- cer
pipeline_tag: automatic-speech-recognition
---

# Model Card for Model ID

![image](./cover_image.jpeg)

<!-- Generated using cagliostrolab/animagine-xl-3.0 -->
<!--Prompt: 1girl, black long hair, suit, headphone, write down, upper body, indoor, night, masterpiece, best quality -->


Fine tunned ASR model from [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2).

This model aimed to transcribe japanese audio especially visual novel.

# WaifuModel Collections 

- [TTS](https://huggingface.co/spow12/visual_novel_tts)
- [Chat](https://huggingface.co/spow12/ChatWaifu_v1.2.1)
- [ASR](https://huggingface.co/spow12/Visual-novel-transcriptor)

# Unified Demo

[WaifuAssitant](https://github.com/yw0nam/WaifuAssistant)

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** spow12(yw_nam)
- **Shared by :** spow12(yw_nam)
- **Model type:** Seq2Seq
- **Language(s) (NLP):** japanese
- **Finetuned from model :** [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2).


## Uses

```python
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import librosa

processor = AutoProcessor.from_pretrained('spow12/Visual-novel-transcriptor', language="ja", task="transcribe")
model = AutoModelForSpeechSeq2Seq.from_pretrained('spow12/Visual-novel-transcriptor').cuda()
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="ja", task="transcribe")

data, _ = librosa.load(wav_path, sr=16000)
input_features = processor(data, sampling_rate=16000, return_tensors="pt").input_features.cuda()
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])
```

## Bias, Risks, and Limitations

This model trained by japanese dataset included visual novel which contain nsfw content.


## Use & Credit

This model is currently available for non-commercial use only. Also, since I'm not detailed in licensing, I hope you use it responsibly. 

By sharing this model, I hope to contribute to the research efforts of our community (the open-source community and anime persons).


## Citation

```bibtex
@misc {Visual-novel-transcriptor,
    author       = { YoungWoo Nam },
    title        = { Visual-novel-transcriptor },
    year         = 2024,
    url          = { https://huggingface.co/spow12/Visual-novel-transcriptor },
    publisher    = { Hugging Face }
}
```