OpenAI’s Voice Engine: Sing Me (or Anyone Else) a Song! But Maybe Not…

Arush Sharma
2 min readApr 8, 2024

OpenAI just dropped a new trick — Voice Engine! This little guy can listen to 15 seconds of your dulcet (or not-so-dulcet) tones and then, become your vocal doppelganger.

Imagine Morgan Freeman narrating your grocery list, or your dog finally telling you what squirrels are really saying. While this tech has the potential to be the ultimate karaoke partner or a boon for the vocally challenged, there are a few bumps on the road to impersonation paradise.

Sing Like a Star (or at Least Like A. R. Rahman)

The blog post paints a rosy picture: textbooks coming alive in your kid’s favorite voice, people who can’t speak getting their voice back, and even movie trailers narrated by your sleep paralysis demon (although that last one might be a personal preference). Content creators can translate videos with the original voice intact, making everyone from ASMR YouTubers to used car salesmen sound globally sophisticated.

But Can It Sing Despacito Like Bad Bunny?

OpenAI knows things can get dicey. Imagine political ads narrated by a voice that sounds suspiciously like your favorite celebrity. Deepfakes anyone? OpenAI is taking precautions, making their partners pinky swear not to impersonate anyone and getting permission from the original voice owner before their vocal cords get digitally borrowed. They’ve even sprinkled some digital watermarks on the generated audio, like a secret sauce for AI-cooked content.

Hold on Now, Gotta Think This Through

OpenAI’s playing it smart by hitting pause on a wide release. They’re calling for a group hug with policymakers, researchers, and anyone who doesn’t want the world to descend into a cacophony of vocal impersonations. Here’s where things get interesting: maybe our reliance on voice verification for things like bank accounts needs a rethink. And how about some laws protecting our voices from being turned into digital marionettes?

The Future Sounds…Uncertain

OpenAI deserves a pat on the back for being transparent. This tech is powerful, and figuring out how to use it responsibly is like teaching a parrot not to swear — possible, but requires constant vigilance. One thing’s for sure, the world of AI is getting more interesting by the day.

Let’s just hope the soundtrack isn’t all political robocalls.

Reference: https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices

--

--