AI Singing Voice Training and Customization
Written by
Published on
6 de novembro de 2024
One of my main roles at Kits is ensuring our royalty-free models are trained with solid, inspiring datasets that don’t just sound good, but are inspiring to work with. Some parts of this process are purely technical, while others lean into creative choices that shape the model’s character. Today, I’m breaking down how to optimize your own training data and make some intentional creative decisions to add unique personality to your voice models.
Over the past few weeks, my articles have covered my process for creating some of our more character-based voices and the unique techniques I used. Whether it was singing through a guitar amp for my Male Overdrive Rock model or using a ribbon microphone to capture one of my studio monitors for Vintage Female Jazz, the ways to create a standout dataset are truly endless.
The Foundation
A solid foundation is the most crucial part of creating any voice model. Regardless of any special attributes I might want to add, I always start with a clean vocal capture. This means removing background noise–air conditioners, fridge hum, whatever’s lurking—that can degrade your model’s sound and create issues down the line. Let’s say you recorded a great 30-minute dataset, but on playback, you hear a low hum that was barely noticeable in the room. Been there! I’ve lost myself in a take, only to later catch an amp buzzing like mad or the heater running in the background. Check out our guide on how to record high-quality vocals yourself if you're starting from scratch.
A tool like iZotope RX makes it easy to fix consistent hums and buzzes. Just open RX’s Spectral De-noise module, select a section of your audio with only the background noise, hit “Learn,” and play the audio. RX will analyze and automatically adjust its noise reduction. You may want to fine-tune it further by adjusting the Threshold and Reduction faders, but RX simplifies removing those pesky artifacts.
Gain Level Matters
Setting a proper gain level is also key. When creating models, I aim for a consistent -12dB level, with peaks no higher than -6dB. This lets the audio stay dynamic while giving the machine learning the ideal volume to train effectively. I often see submissions that are either way too low in volume or clipping in the red. Digital clipping doesn’t give you that pleasant saturation you might want in a rock vocal–it’s just harsh, and machine learning algorithms aren’t fans either.
Creating Character
Though a clean, solid dataset is usually the best base, allowing you to manipulate things once imported into your DAW, sometimes it’s fun to bake in some character directly in your training data. Any sound you upload with an effect applied will automatically carry that quality in your model–no DAW magic needed later. This can be perfect for content creators wanting access to a specific vocal vibe, like a radio or walkie-talkie effect that emphasizes the high-mid frequencies and adds a bit of grit. Apply this to your entire dataset, and you’ve got a go-to model that instantly sounds like it’s coming through a radio.
Or maybe it’s time to dust off that old distortion pedal in the corner! Running your dataset through it can add a whole new level of vocal character.
I often like to run vocals through a guitar amp–cranking the overdrive and adjusting it to taste. Why not blast through your Marshall half-stack and see how long it takes before your neighbors call the cops!
However maybe you’d rather avoid the noise complaint and try one of these little battery-powered Marshalls instead. (Side note: these tiny amps are studio gold–don’t sleep on them!)
Another trick? A wah pedal. Keeping a wah “cocked” at certain points can produce a wide range of filtered effects. No need to get fancy here; a standard Dunlop CryBaby works great.
And for an authentic lo-fi vibe without the reel-to-reel tape deck, try a cassette recorder. This one features a built-in mic and USB 2.0 port. Using the built-in mic to record from your speaker onto cassette can produce a beautifully degraded, warm sound. I may need to grab one of these myself–perfect for experimenting!
Conclusion
At the end of the day, making music should be fun, and for me, that means pushing boundaries and finding new sounds. Don’t worry if your first upload attempt doesn’t land the way you want–every take is part of the process, informing your next move. Kits.AI is here to help you create something inspiring and unique. So go for it–the sky’s the limit!
-SK
Sam Kearney is a producer, composer, and sound designer based in Evergreen, Colorado.