AI Is Making Audio-Editing And Transcription Part Of Its Bold Step Forward

5th January, 2018 by

Radio has been a longstanding media institution across countries, continents, and cultures. The media form rewards the essence of storytelling by putting it front and center, and the result has been a surprisingly noble ability to withstand changing tides.

Creating a cultural narrative

As our culture in general moves towards something faster, more efficient and entirely image-based, audio storytelling has managed to not only endure but thrive. With 67 million monthly listeners, podcasts are more popular than ever, jumping up 11% from this time last year. Digital reporting, like the work done over at National Public Radio (NPR), has seen enormous growth, up 26% and 43% over commercial radio in the mornings and afternoons respectively. And audiobooks, once a form destined for the outer reaches of the literary audience, is now its own industry, with a 40% listener growth each year, and a peak of 1.6 billion listeners in 2016 alone.

All of which is to say that the form shows no signs of slowing down, making it the perfect industry to push into new technological terrain. With audio media growing, the tools used to make it need to take that same leap forward. Descript, a new startup assembled by Groupon founder Andrew Mason, is one example of a company that’s pushing the form and merging it with existing technology at the same time.

Automated Transcription Technology

Software like Dragon Dictation has attempted to design automated transcription technology, taking an existing audio file and turning the recording into a transcribed word document. Descript is an example of a company building off of similar technology, but almost in reverse. Instead, the transcription – presented as a Word document – gives you access into the audio file. From there, the editing of the document simultaneously edits the audio file itself, meaning that if you cut a sentence, that spoken line is cut from the file. Utilizing AI, the technology would mimic the voice in the recording, allowing the addition of words to then be added into the audio file as well.

In many ways, this is about a single company’s potential for innovation, but it’s also a sign that audio media might be one of the first testing sites for AI-produced media. Work driven by advancements in artificial technology is still in an early testing phase, to say the least, but considering the lack of visual component, audio media might be the perfect foray for a technology still in its infancy. Technology like this has already seen some pushback, such as with voice mimicking software Lyrebird, which many have accused of turning the tide of the wave of fake news that seems to be overtaking the media ecosystem.

That Montreal-based AI startup is, by design, more indirectly manipulative, only able to mimic the voice of a real person. The technology can shift the cadence in order to emphasize or alter the form of a sentence. It can build upon a recording, but the voice must belong to an actual person, which makes for insidious intentions—for every cameo from a deceased actor, there is a widely circulating fake video of sitting politician.

A New Era for AI

Still, the topsy-turvy way in which the technology’s still being considered is less of a warning sign and more proof that we are in the developmental stage of an entirely new era.

Technology like Descript has the potential to make the burgeoning podcast industry even more efficient, especially as programs like the New York Times’ The Daily leads the charge on a same-day turnaround, turning the news into a digestible podcast five days a week. The speed at which work is being turned around—and in the current media climate, speed is less feature than a demand—means that there is more than enough space in the form to allow for innovations of Descript’s scale. When it comes to audio media, AI technology might just get the last word.

(Visited 62 times, 1 visits today)