Skip to Content
chevron-left chevron-right chevron-up chevron-right chevron-left arrow-back star phone quote checkbox-checked search wrench info shield play connection mobile coin-dollar spoon-knife ticket pushpin location gift fire feed bubbles home heart calendar price-tag credit-card clock envelop facebook instagram twitter youtube pinterest yelp google reddit linkedin envelope bbb pinterest homeadvisor angies


I like to keep things simple, straightforward, you know. Treating you, dear reader, as an intelligent being who can put two and two together. After all, we are two humans engaged in an act of communication, me writing, you reading. And as humans, we understand there’s enough of a shared experience between us for this communication to even be worthwhile. There’s an underlying assumption that you will pick up what I am putting down.

But what if we weren’t humans in this exchange? What if this entire article were written by a sophisticated – or in my case, not so sophisticated – AI for the purpose of being read by another AI? What if we enter a world where all business, all decision-making, is handled by machines? I don’t know if I’m excited or scared at the thought, but I do know we’re a long way from that reality (we’re not that far actually).

In the meantime, we have seen tremendous technological advancements in our Localization industry that have provided value vis-a-vis efficacy, efficiency, and perhaps greatest of all, cost reduction. Decision-makers have more options for multimedia elements. These options can help improve the impact of their instructional design and also make translation easier should it be needed down the line. 


One such advancement in Localization has been the rise of text-to-speech technology. You need a voice-over, maybe some narration, you call upon a voice actor. But that may not be the best solution for your needs. And now you have a choice. A choice between the humans and the machines :evil laugh:.

Now, I have to admit I’m going out on a limb here and assuming the choice for audio in your eLearning course won’t have consequences as dramatic as John Connor traveling back in time to save the world from Skynet but there are still some things to keep in mind when making the choice between human voice over and text-to-speech. And before we dive deeper into those, we’ll first consider the role audio plays in eLearning content in the first place.

“The VARK of time bends toward eLearning”

Granted, that’s not quite the more widely known saying, but it could be a new one. VARK of course is the model of learning that identifies 4 types of learners by the mode through which they learn best. Visual. Auditory. Reading. Kinesthetic.

Regardless of whether or not we strictly adhere to the VARK model of learning, we do understand that each of these learning methods appeals to one or more of our senses. And incorporating more of a learner’s senses will create a more immersive experience for them. A more immersive experience leads to a more engaged learner. Engagement equals retention. Retention equals true learning.

So, it’s safe to say, the audio elements included in a learning module can either help or hinder the overall experience and the attention paid to the quality of that audio will help ensure the desired return on investment.


Now that you’ve decided to move your eLearning from the era of silent films to one of wondrous sound and voices and intonation and inflection, you have a decision to make. A real person. A real machine.

And In This Corner…

When it comes to true connection, nothing beats the tried and true, the warmth, the organic human voice that wraps itself snuggly around the listener with the comfort of a family quilt. People connect with people. Using a voice actor for your eLearning courses will simply produce the best result. But “the best” comes at a price, and it could be a price you aren’t able or willing to pay. And the price isn’t just the money either.

Human voice actors are human. That means they have their own lives, families, obligations, and schedules. They may not be available when you need them. If you don’t already have relationships with voice actors, you have to find them, put out a casting call of sorts, and spend time vetting and rejecting. They may need a recording studio, and that introduces more cost to you. The lack of voice talent in particular languages means the price will be higher (supply and demand of course). All of these “problems” are certainly solvable it’s just a matter of what you can invest and how important that truly authentic human voice really is to your courses.

And Their Opponent…

Text. To. Speech. It’s pretty impressive what machines are doing now. We’ve got smartphones, smart homes, and pretty soon smart clones. I don’t even know what a smart clone is but that sentence flowed. But really, the point is that machines are capable of performing a lot of complex tasks, and their ability marches along in one direction. Onward and upward.

But even through all that advancement, the voice itself, still cold. The emotion, disconnected. Our ears, finely tuned to the natural acoustics of a fellow human, can tell immediately. This is an imposter. A fake. A robot. And maybe that matters. Maybe that little bit of disconnection is enough to eschew this option altogether.

Or maybe it doesn’t. Maybe just having a voice is enough. Maybe machine voices are so ubiquitous in people’s lives now that one more isn’t really detrimental to anything. Siri. Alexa. Even TikTok has a native text-to-speech AI in its video editing templates – with 8 voices to choose from, some of them even sing your words in a digitized ballad of sorts. 

It’s important to keep in mind that a machine’s output is only as good as its input. And right now even the very best text-to-speech programs will have trouble with some languages, accents, pronunciations, and the various aspects that make up the “richness” in a human voice. Not to mention, the more technology you incorporate, the more engineers and staff who understand the programming language you will need. Just something else to consider.

And Still… And New…

I can hear Bruce Buffer screaming now. But which popular phrase is it? Is the human voice retaining its position as champion or has the text-to-speech technology come far enough to score the victory? These are questions ultimately answered by your top priorities. 

Of course, there is something to be said for the quality of a professionally trained, human voice actor. All things being equal, I’m sure this is still the preferred direction to go. The trouble is that all the things – the variables – are very often NOT equal. Budget and time are the two biggest and those could definitely have you looking for other solutions. The great thing is that there IS an alternative and that alternative is getting better by the day.


Whatever gap there is right now between human voice-over and text-to-speech technology will only shrink over time. There isn’t much better to get on the human side. Outside continuing developments in recording and production equipment, a human does what a human does. Whereas the AI technology in text-to-speech is a different animal entirely.

Computer voices continue to sound more human. The vocabulary continues to get more robust. What we can program a machine to do seems to be bordering on the infinite. This means at some point, distinguishing between a human voice and a machine-generated one may be nearly impossible. 

In 2016, Adobe introduced a program called VoCo that they dubbed “Photoshop for voices”. The app can basically imitate any human saying anything, even things the person never said. There are some scary implications there, similar to video deepfakes.

Not to be left out in the cold, Google and Microsoft are testing the waters with their own software and the Google product in particular –  called WaveNet – is a neural network modeled after real speech, and sounds like it.

See, I think there will always be a place for the human element. In fact, the colder and more digitized things get, there may even be a nostalgic push to keep humans involved. Whatever the case, I don’t foresee an eLearning landscape where human voiceovers no longer exist. However, I can see it being similar to online shopping versus in-person retail. The digital option becomes the much more common, even standard choice, but the in-person, human, is there for those who find it more appealing.

The machines will rise… they’ve already risen!