This Is Not My Real Voice: Eleven Labs Premium Voice Clones For Podcasting
Manage episode 519358869 series 3690272
This is not my real voice. It's a robot.
Call WorkHacker Chief Strategist Rob Garner at 469.347.4090, or email [email protected] for more details about how we can help your business.
www.workhacker.com
--- FULL TRANSCRIPT BELOW---
Thanks for listening. Today I want to direct this episode toward all of you who have spoken with me before, and have actually heard my voice in person, or maybe on a phone call, or in a Google Meet or Zoom call.
I've been conducting an experiment over the last two months with Eleven Labs voices. It wasn't a secret per se, but the surprising reactions I received warranted this explanatory episode.
The voice you are listening to right now is not me - it is an Eleven Labs premium voice clone. You are now in effect, listening to a robot. Your ears want to believe it is my actual voice reading this narrative, but it is not. The last sentence was synthetic.
This sentence is also synthetic. And the remaining audio is synthetic. In fact, ten of the 12 previous episodes utilized this voice clone, though the ideas, thoughts, and words were all mine.
I wrote every single word you are hearing now. While I started with the clone, you can expect to hear more of my real voice in future episodes. The episodes interviewing Bruce Clay, Viktor Grant, and Bob Heyman were all recorded live, as you can plainly tell when compared to these narrative-styled episodes.
I will leave it to you to judge the quality of this audio. Throughout this experiment, I have been quite surprised at how many people did not detect that this was voice cloning technology at all. I had incorrectly assumed that most people would be able to detect the clone, but this was overwhelmingly not the case.
These are people who know me very well, some who speak with me almost daily, or several times a month. There were some who thought I had done overdubs, due to slight changes in the timbre from paragraph to paragraph.
But one thing is for sure, if you did not know this was a synthetic voice before this episode started playing, you certainly do now, and all of the potential audio defects are now exposed.
It will become easier for you to recognize, not just with my voice, but with many other voices. It is an acquired detection skill that I think helps us think more critically when we are either knowingly or unknowingly consuming synthetic media.
But as the technology gets better, it will require a more discerning ear, until we potentially get to the point that it can't be detected at all, only suspected.
If you are wondering how the premium voice cloning technology works, Eleven Labs requests up to two hours of sample voice recording. This can be a single file, or multiple files.
Once the files are uploaded, it takes them about four-to-six hours to render the premium clone.
They had me read a full chapter of the Great Gatsby, and also one from Jane Eyre. I also read some business focused content, all for a total of approximately 90 minutes of audio. The better your recording set up is, the more accurate your voice clone will turn out.
I have created voices for my clients using different types of cloning. The results vary greatly. For a premium Eleven Labs account, only one custom premium voice clone is allowed. The Instant Voice Clone feature requires a shorter audio example, and can be rendered in minutes. I have had some Instant Voice Clones do a good job, replicating a permitted client's voice to about 80-85% accuracy.
In other cases, the instant voice clone does not sound like the sample voice at all, but can create original and usable voices nonetheless. The Instant Voice Clone is not near as expressive or accurate as the premium clone.
There are also many other intricacies in creating and rendering voice clones for content.
Speech synthesis markup language can be used to fine tune.
There are also tools for pronunciations and inflections.
It is also quite a strange feeling to hear yourself say words that were never spoken. Like many people, I am very cautious about the future of artificial intelligence, and I am very concerned about its potential to be misused.
But years ago I decided to continue to adapt, not just professionally, but to better understand this technology and the new world in which we are headed, whether we like it or not.
It alleviates unnecessary fears, and provides more focus on how to navigate the increasingly complex world we are being pushed into.
The technological powers-that-be have long followed a mantra that may or may not be the best thing for society: If a technology can be done, it will be done. While most of us have no control or say in these developments, the next best thing one can do is to be as acutely aware of its capabilities as possible.
Perhaps the most jarring thing about this entire process is that it forces a change in how we must perceive reality across digital spaces. Not just in voice enabled spaces, but every digital space. It becomes clear that if a voice can be convincingly cloned, we must all be aware that a conversational voice we are speaking with - even with someone known to us - must be verified.
I will continue to iteratively use my cloned voice for future podcast episodes. And I will also continue to use voice cloning and design to produce high quality podcasts for my clients. Synthetic voices have been an invaluable tool for getting a channel warmed-up for real human hosted podcasts. And when the content is good and voices are rendered to a high standard of quality, the audience doesn't mind, and sometimes prefers it.
For an example of a very successful synthetic podcast, check out Arnold Swarzenegger's long running show, designed to scale his knowledge and health acumen to a wide audience.
He is straight and upfront that synthetics are being utilized.
I will also be producing more live interviews with other top experts in the field. And the music performed by the WorkHacker Orchestra in the intro and outro - that was recorded live, with real humans improvising musically in real time, including me.
I would like to extend my sincere thanks for listening this far - these words and sentiments are real, even if the voice delivering them is not.
---
14 Episoden