Training

training

this is a guide for local training using difftrainer on windows

requirements

in order to train locally, you need:

a GPU with cuda cores
at least 6 gb of vram*
a decent amount of hard space
a DiffTrainer-compatible** version of cuda toolkit

*you might be able to get away with less. lower the batch size if you're this low
**difftrainer compatible versions of cuda toolkit are: 11.8, 12.1, 12.4, 12.6, 12.8, 12.9

once you have cuda toolkit installed, you can move on to difftrainer installation

difftrainer installation

this section assumes you have never used python before. the download link below also contains a guide for installation if you already have python

difftrainer can be downloaded through this link

the first thing you want to do is extract the folder you've just downloaded into somewhere where your computer will not step in and exercise needless authority over you when you are the goddamn administrator and you can do as you please. ensure that you also have administrator permissions. for me, the C:/Users/VIPchan folder is where i installed it so that windows wouldn't try to screw me over!!! AGAIN!!!
take a deep breath and run the conda_installer.bat file

if you don't know how, this is done by right clicking the file
after that, you click "Open"

i don't recall what exactly happens from here, but just let it do its thing
once it finishes, run the "run_gui.bat" file the same way you did with the first one
you should now see this:

from here, you will click "Update tools." it is now going to be downloading and installing stuff for you. give it time. you can watch the progress in the command prompt window. it will tell you when it's done with the sound of a soft-voiced girl saying "setup complete :D"

now difftrainer is fully set up!

difftrainer usage

data preparation tab

back up your files before you do ANYTHING below!

you should now open the data preparation tab. this is what it looks like:

now, you should move your .wav and .lab files into the Difftrainer-main/raw_data/diffsinger_db/[your diffsinger's name here] folder. they should look like this:

now, in the data preparation tab, select the raw data folder. here, you want to select "diffsinger_db". then, you will select SOME to estimate midi, processing power permitting. then, you will click "prepare data"

once it finishes, the command prompt will say "data segmentation complete!" from here, you can move on to the configuration tab

configuration tab

the configuration tab looks like this:

click "select formatted data folder." i personally leave it the same as the raw data folder, so i select diffsinger_db again. now, it will display your diffsinger here:

real quick, before you continue, you'll want to compress your original .lab files into a .zip, go to mae blythe's phoneme extractor, upload your .zip file, process it, and then it will output a custom dictionary for you. from here, find DiffTrainer-main/Diffsinger/dictionaries/ja-phonemes.txt. open "ja-phonemes.txt," delete the contents of the file, then, copy and paste what the website output in its place. you will then remove the "pau" phoneme from the dictionary. after this, save the file and close notepad.

now, in the dropdown under "language," select "ja." you will then select "edit language settings." click on any other dictionaries that are displayed by difftrainer langloader, and select "remove dictionary." the only one you want remaining is the "ja" dictionary. you will also now open "merged.yaml" located in the same folder as "ja-phonemes.txt." you will now delete everything written below "merged_phoneme_groups:," and then save the file. after this is done and your langloader looks like the screenshot below, select "save and return to configuration." if you have not seen the langloader, it may be hiding under another window.

now that the dictionary is set, in the configuration tab, you will select the type "acoustic." you will then select "kitchen sink" from the "select configuration" drop-down menu. depending on what your machine can handle, it is also recommended to adjust the batch size here too. i personally use a batch size of 12, but if you have a lower end machine, you may want to have it be lower than the default, which is 9. you will now click "save configuration."

preprocess and train tab

this is the preprocess and train tab:

the first thing you will do is select configuration. select the "acoustic.yaml" file. you will now create checkpoint folders. i like to make 2, one for acoustic, and one for variance. i named mine "auc" and "var." now you can select your acoustic checkpoint folder. just in case, also click "use tensor cores." now you can click "preprocess data" and let it run until it finishes. you will know it's done when the difftrainer window starts responding again and the command prompt isn't doing anything.

and now you click train and let it go off for however long it takes. i typically do it before bed or before going out somewhere. let it train for as long as you'd like. the more steps, the higher the quality of your diffsinger. the limit is 100000 steps.

viewing graphs and checking progress

if you'd like to check the progress and view graphs, listen to audio, etc to see the progress it makes as it trains, launch tensorboard. to launch tensorboard, you must:

open miniconda
type "conda activate difftrainerB," press enter
type "tensorboard --logdir=" and then click and drag the "lightning_logs" folder into the miniconda window. it should put the file path to the lightning logs folder there. click enter
it will now tell you where you can access tensorboard in your browser. for me, it is "http://localhost:6006/." for you, it may be different. each time difftrainer saves a checkpoint, the site will update. you must refresh the page to see these updates.
DO NOT close the miniconda window if you want to continue viewing tensorboard. i typically leave it open until i'm done training

configuration tab(again)

now that you're done training acoustic, it's time to train variance. to do so, go back to the configuration tab and select "Type: Variance." you will now select a new checkpoint folder. for me, the folder is "var." do not save it in the same folder as where you put the acoustic checkpoints, they will be overwritten if you do that. now click "save configuration."

for me, the langloader sometimes decides to break at this point and i have to close and reopen difftrainer. ensure that the exact same settings are selected as when you did the configs for acoustic earlier, minus the changes made in the paragraph above.

preprocess and train tab(again)

you are now on the preprocess and train tab again. you will select configuration again this time, but instead it will be "variance.yaml." your checkpoint folder will also be "var" instead of "auc." you can now preprocess your data and train again just as you did before, just with the lightning_logs in the "var" folder. you can open tensorboard in the same way as above as well. just keep in mind that there is no audio to listen to when training variance.

export singer(basic) tab

now that you are done training, it is now time to export your singer. you will click the acoustic button, then the "select checkpoint folder" button. you will select the acoustic checkpoint folder. once your acoustic checkpoint folder is selected, click "export ONNX" and wait for it to finish exporting. your command prompt will tell you when. you will now click the variance button, click the "select checkpoint folder" button, select your variance checkpoint folder, and then export ONNX again. once it finishes, you will click the "select acoustic checkpoint folder" button and select your acoustic checkpoint folder. you will then click the "select variance checkpoint folder" button and select your variance checkpoint folder. you will now type in your singer's name with no spaces or special characters. the diffsinger i am making is named "tsukuyomi sena," but for the purposes of naming it here, i just put "sena." you will now select where you want your diffsinger to be saved. put it in your openutau "singers" folder. after this, click "prepare for openutau." you are now done using difftrainer.