spow12
/

Visual-novel-transcriptor

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

Visual-novel-transcriptor / README.md

spow12's picture

Update README.md

9460198 verified 7 months ago

|

2.51 kB

	---
	library_name: transformers
	datasets:
	- reazon-research/reazonspeech
	- joujiboi/japanese-anime-speech
	language:
	- ja
	- en
	metrics:
	- cer
	pipeline_tag: automatic-speech-recognition
	---

	# Model Card for Model ID

	![image](./cover_image.jpeg)

	<!-- Generated using cagliostrolab/animagine-xl-3.0 -->
	<!--Prompt: 1girl, black long hair, suit, headphone, write down, upper body, indoor, night, masterpiece, best quality -->


	Fine tunned ASR model from [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2).

	This model aimed to transcribe japanese audio especially visual novel.


	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

	- Developed by: spow12(yw_nam)
	- Shared by : spow12(yw_nam)
	- Model type: Seq2Seq
	- Language(s) (NLP): japanese
	- Finetuned from model : [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2).


	## Uses

	```python
	from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
	import librosa

	processor = AutoProcessor.from_pretrained('spow12/Visual-novel-transcriptor', language="ja", task="transcribe")
	model = AutoModelForSpeechSeq2Seq.from_pretrained('spow12/Visual-novel-transcriptor').cuda()
	model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="ja", task="transcribe")

	data, _ = librosa.load(wav_path, sr=16000)
	input_features = processor(data, sampling_rate=16000, return_tensors="pt").input_features.cuda()
	predicted_ids = model.generate(input_features)
	transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
	print(transcription[0])
	```

	## Bias, Risks, and Limitations

	This model trained by japanese dataset included visual novel which contain nsfw content.


	## Use & Credit

	This model is currently available for non-commercial use only. Also, since I'm not detailed in licensing, I hope you use it responsibly.

	By sharing this model, I hope to contribute to the research efforts of our community (the open-source community and anime persons).


	## Citation

	```bibtex
	@misc {Visual-novel-transcriptor,
	author = { {YoungWoo Nam} },
	title = { Visual-novel-transcriptor },
	year = 2024,
	url = { https://huggingface.co/spow12/Visual-novel-transcriptor },
	publisher = { Hugging Face }
	}
	```