Skip to main content

FAQ for Voice Cloning

Some quick tips to have a more effective Voice Cloning experience

Updated over 2 weeks ago
  • The AI needs lots of source material to make a good clone - 2 / 3 hours of recording is best

  • Use a script that matches the emotional style you want the voice clone to have - e.g. if the voice clone will be used for radio advertisements, and you want it to have an exaggerated commercial style, you could use a script that is a combination of lots of radio ads. You could get ChatGPT to write this, or ask Audiostack for help if you need it.

  • Read the script in the style that you want the voice clone to sound like. E.g. am exaggerated commercial voice, e.g. a calm steady voice, etc.

  • Building on the point above - if you want an commercial voice for advertisements, be extremely exaggerated in the recording. The AI will tone it down. Otherwise there is a risk that the clone sounds more “flat” than you want it to be.

  • Record in a quiet environment to minimise background noise. Avoid recording in an echoey room with lots of hard surfaces.

  • Enunciate words clearly and maintain a consistent volume and distance from the microphone.

  • Position the microphone close to the speaker's mouth (but not so close that the audio clips)

  • Use a pop filter. This is a small mesh screen you can place in front of the microphone to reduce the risk of certain sounds (such as "p" and "b") causing clipping.

  • If your microphone or interface has controllable input levels, aim to set your levels so that you're staying around the top end of the "green" range.

  • Use a decent microphone (generally condenser microphones pick up greater detail for vocal recordings) for better sound quality, or consider renting a studio environment for an hour if you're on a budget.

  • Remove breathing sounds as much as possible from the audio file before sending it to Audiostack

  • Do not apply any audio effects e.g. compression to the file, leave it completely RAW

  • Send the file to Audiostack in .wav format, with the largest file size possible to ensure good audio quality

Did this answer your question?