Global eLearning was pleased to attend the recent LocWorld41 conference. Attendees are typically responsible for communicating across the boundaries of language and culture in the global marketplace.

I was invited to present on the topic of Text to Speech for Localization at LocWorld41 in San Jose, California. The presentation was titled: Robot Voices in Multimedia, Adventures in Text to Speech [TTS]. Below is a recap of the biggest takeaways from this discussion.

The primary takeaway for those interested in eLearning is that we can now reduce the time and costs for internal training using multimedia Text to Speech (TTS); in other words text to speech for localization initiatives. Previously, translation and localization specialists, along with their learning and development customers, had to weigh the cost of including audio and video against limited budgets and/or time constraints. Adding voiceover for a speaker in a dialect or specific language can sometimes be difficult or expensive. But, text to speech for localization now allows us to accomplish these tasks.

With the advent of this technology, we can now localize and translate content faster and often, for lower costs. While voiceovers have traditionally been done by voiceover artists, often at a higher cost, those efforts can now be replaced by technology and engineers. This approach scales much more efficiently than a voice artist.

During the presentation, someone in the audience asked, “Is that the end of the voiceover artist?” The short answer is: No, not at all. These roles will likely shift where voiceover and local artists use their expertise where more concise translations can provide more value. The new paradigm is now recognized as a “Human in the Loop.” This process lets the machines do the dirty work.

For example, in preparing this portion of this blog, I’m not even typing this entry. I’m dictating it to the speech recognition program. I still have to edit it, make a correction here and there, but at the end of the day, my speech rate is much faster than my typing rate. But, the editing process is still required.

The same applies for translation, localization, and voice-over. We really don’t need to have people saying click next or looking for a stock phrase like “thank you”. Technology is now at a stage where we can put our efforts into refinement and doing the fit-and-finish work that gives the final product a high standard, in less time. Yes, the words have to sound right, and the “you” in “thank you” might be formal, male, female, plural or singular–that’s the part where you ask someone with subject matter expertise to review and correct things–but the revision process is two or three times as fast as full creation.

Also discussed, during the presentation on Text to Speech for Localization, was the huge advancements made in the last few years. Text to speech languished for almost 50 years. It still has a parallel existence in talking elevators, phone menus, and things that are just not that interesting. However, technology companies like Google, Amazon, and Microsoft have started opening the opportunities to new and better things–not just speaking phone agents or robots that can answer our questions about the weather. These technologies really providing a level of service that we’ve never expected from a talking machine. These capabilities are now available in many languages. The adoption rate for use, in the next two to five years, will be tremendous, making TTS commonplace.

At Global eLearning, we are constantly developing comprehensive tools and techniques that enable TTS to happen in dozens of languages, at reduced costs, and faster turn-times. This is an exciting time to be a leader in the field! Do you need help translating or localizing content for a global audience? Do you want to utilize the latest technologies to lower costs and speed up the localization and translation process? Global eLearning is your dedicated resource for learning and development localization! Contact Global eLearning today for a free consultation!

LocWorld41: Text to Speech for Localization

Gilbert Segura