Artificial Intelligence Generates Humans’ Faces Based on Their Voices
by Meilan Solly Smithsonian.com
A new neural network developed by researchers from the Massachusetts Institute of Technology is capable of constructing a rough approximation of an individual’s face based solely on a snippet of their speech, a paper published in pre-print server arXiv reports.
The team trained the artificial intelligence tool—a machine learning algorithm programmed to “think” much like the human brain—with the help of millions of online clips capturing more than 100,000 different speakers. Dubbed Speech2Face, the neural network used this dataset to determine links between vocal cues and specific facial features; as the scientists write in the study, age, gender, the shape of one’s mouth, lip size, bone structure, language, accent, speed and pronunciation all factor into the mechanics of speech.
According to Gizmodo’s Melanie Ehrenkranz, Speech2Face draws on associations between appearance and speech to generate photorealistic renderings of front-facing individuals with neutral expressions. Although these images are too generic to identify as a specific person, the majority of them accurately pinpoint speakers’ gender, race and age.