Gather 30-60 total minutes of dry (no effects) and monophonic (one note at a time) vocals.
No reverb, delay, chorus, or instrumentals,
No harmonies, layering, doubletracking, stereo effects.
No variation in vocal styles. Eg. just singing or just rapping but not both.
Stereo, reverb, delay
Mono, clean tone, low noise
Getting your file(s) ready.
Export your files with no silence and consistent volume as a 16-bit lossless audio file (.wav preferred).
Before: silence, inconsistent volume levels
After: truncated silence, consistent volume
Once you’ve compiled your vocals, the next step is to prepare your files for training:
Remove any extra silence (we recommend doing this automatically with Audacity)
Export as true mono (rather than stereo with equal L + R channels)
Export as 16-bit .wav (no audio length requirements, can be one 15-minute file or 15 1-minute files)
How to convert to mono and remove silence with Audacity
Use the Kits.AI Vocal Separator tool to isolate vocals for your dataset.
To isolate vocals from a song, simply upload a file into the Kits.AI Vocal Separator tool. This is an easy way to create your own dataset.
Advanced dataset techniques.
Pre-process your audio for higher quality.
Your audio can be:
clean EQd (subtractive) to reduce muddy or harsh frequencies in the recording
subtly pitch corrected (slow attack, moderate strength) unless it's a key part of the vocal style
De-essed to reduce any harsh sibilance
Compressed lightly to even out dynamic range/reduce peaks (~4-5db of gain reduction at most)
Boosted (additive EQd) to fit the style of the vocal
Limited to a peak of -6db with overall levels between -6 and -12db.
High/low passed to remove frequencies below 40hz–100hz and above 20khz
Phase re-balanced
Record your own vocals.
Recording vocals for your model? Here are some configurations to get you started.:
Use a quality mic with a wide frequency range (40hz–20khz)
Set your recording sample rate to 48khz and file type to lossless (.wav, .aiff, .flac)
Limit breath sounds and try to capture a clean tone (avoid plosives, place mic off-axis &/or use a pop filter if singing in a breathy style)
Avoid room reflections (record in a room with soft surfaces like carpet and furniture to absorb sound, place microphones away from walls, move closer and reduce your input gain)
Monitor your recording volume and avoid exceeding -6db dBFS. Try to keep your levels between -12 and -6 dBFS.
Export your audio as true mono (rather than stereo with equal L + R channels)
Avoid any hard cuts on audio (add a short fade out to avoid pops that come from cutting audio before or after a zero crossing)
Content
More variety, the better.
Best to have examples covering your entire range. Chest, mix, falsetto; large and short intervals; grit and clean notes; etc. The more variety, the better.
You can sing the same lyrics in different keys, a couple songs from your repertoire, originals, etc. The audio can be in multiple files or in one single take — as long as the singing time adds up to 10–15 minutes.
Techniques
How to convert to True Mono
Use the free Audacity program to convert stereo files to true mono.
How to remove silence
Use the free Audacity program to quickly remove silence from an acapella.
(Copy the settings in this video but feel free to experiment. Choose a threshold of between -20db and -40db depending on the noise level of your acapella.)
FAQ
Q: What do I do if I see an error?
A: If you see an error during upload, contact us at our bug form!