With 2020 being a tumultuous year for everyone it’s a good time in 2021 to reflect on what technologies have improved or also undergone change. Speech synthesis and video manipulation technologies have improved dramatically–the Deep Fakes of 2019 are nowhere near as good as the ones in 2021.
It’s good to be aware of many people being concerned with the potential for disinformation and media manipulation when people encounter the ”Deep Fakes”. The rise of Synthetic Media is something that all content creators and eLearning developers should take note of; not because it’s a “scary trend” but to better understand the future for better content.
The example above from CNN that came out in early March 2021 is the latest in a series of synthetic media from the Deep Voodoo Studio, something that was setup in 2020 to make parody and satirical movies from the creators of South Park. The techniques they use are the classic voice impersonator [a real person], an actor as a stand-in and a sophisticated AI-generated “face-algorithm” of their intended subject. In all cases the subjects are well-known celebrities or politicians that have been on movies and TV for hundreds if not thousands of hours.
The Artificial Intelligence technique is best explained as an algorithm that takes in a significant amount of data, learns how to build a model of that voice or image and then generates a replica. This Deep Learning is the “Deep” part of the fakes, in 2020, the learning got replaced by a slightly better technology: Generative Adversarial Networks, or GANs for short.
The tools and technology for making these updated models have become more accessible and the amount of data needed has decreased. The GANs are more sophisticated since they don’t apply a memorized output, they create it. This ability to create makes the AI process adaptable and realistic. This advance in technology allows for a different type of media with some practical applications beyond parody.
Another way to see this is the concept of “style transfer”–it’s like the format painter in Microsoft Word or Excel–it copies over the look and feel but not the content. With style transfer you can “fake” a voice or face. There’s a popular YouTube channel called Control+shift+Face that has older versions of this approach in a parody setting. These efforts are largely related to people placing well known actors in absurd situations or replacing all characters with the same actor etc. The results are somewhat crude and chock full of artifacts and blurred edges–but that was 2019.
Today the tools are still a niche product–but what they offer is a glimpse into how an old video with great content but outdated visuals could be updated. Or maybe the inverse, great visuals but bad content–turned into something new. There will soon be a way to make audio sound like someone else. More so than text to speech, which is a plain synthetic voice–you can now make it sound like anyone; and the voices will keep the emotion and manner of speaking like a natural person–except it’s not a person at all.
One such company is Respeecher–they are primarily focused on movies where they allow retakes or audio to be re-recorded when the actor is not available–while this technology is currently only being used in AAA movie productions [think The Mandalorian]; one day it will be available in a more consumer friendly form.
It goes without mention that the ethical responsibility for cloning the voices and faces of other people is tremendous; the ability given for content creators is just as immense. These companies are very reluctant to sell an open version due to the potential for abuse. But not all uses are problematic. Recent examples include restoring old archival footage that was not usable into something “normal sounding”.
While we wait for the broader availability of these technologies we are working on providing the supply chain of voices, talent and process to enable the use of synthetic media. Today we have the ability of adding TTS to any eLearning project [not just those in Articulate] and not just using their built-in voices. We can have synthetic recreations in over 100 languages and most importantly have real native-speaking editors make sure that there are no mistakes.
In summary, the creation of synthetic media is becoming more commonplace and it’s not just used for nefarious purposes, it can still be put to creative use today. The main focus in the news and other media outlets has been the potential dangers of this technology which are valid concerns–however we can also take a look at this phenomenon and imagine new ways of making content or extending our current toolset.
For further reading here are two places to start learning more about this challenge:
a university group working on Deep fakes, The Deepfake Lab out of Milan Italy, has useful examples. Also material available from LinkedIn Learning as a course on Deepfakes which goes into detail and is being offered for free.
Gilbert Segura is the CTO at Global eLearning. Learn more about Gilbert!