How to make semantic data from wav files?
Hi, I am Edy who want to make Korean tokenizer for bark voice cloning.
I wonder how to make semantic data from wav source files in Spanish.
I appreciate any helps.
Hey! You first need to gather Korean books on plain text format, quite literally, a bunch of books in korean all in .txt without format
If i remember correctly this project https://github.com/C0untFloyd/bark-gui should already be able to create the files needed to create the tokenizer straight from the GUI
So yeah, you need a bunch of books in korean, not sure how the symbol encoding would work sadly sorry on that.
If the GUI can indeed now create and train it should do everything on it's own, so the process goes like this: the semantic data is created automatically from the books you input, as the program will pick random lines save them on their own files and create Wav files on its own AND then from both the new saved lines and the created waves it will generate semantic data.
I'm sorry I can't explain better