Google’s Translatotron is an end-to-end model that mimics human voices
Google AI today shared details about Translatotron, an experimental AI system capable of direct translations of a person’s voice into another language, an approach that allows synthesized translation of a person’s voice to keep the sound of the original speaker’s voice.
Traditionally, speech translation uses automatic speech recognition to convert speech to text, applies machine translation, then uses text-to-speech to produce a translation, but Translatotron is an end-to-end translation model. Translatotron can complete translations faster and with fewer complications than traditional cascaded models, researchers said.
“To the best of our knowledge, Translatotron is the first end-to-end model that can directly translate speech from one language into speech in another language. It is also able to retain the source speaker’s voice in the translated speech,” a blog post on the subject reads.
The BLEU score to measure machine translation quality found the experimental Translatotron to be lower quality than conventional cascade systems.
The emergence of end-to-end models for machine translation began with a paper by French researchers accepted at NeurIPS in 2016 .
To make Translatotron capable of carrying out end-to-end translations, researchers used a sequence-to-sequence model and spectrograms as input training data. A speaker encoder network is used to capture character of the speaker’s voice, and multitask learning is used to predict words used by source and target speakers.
Translatotron is spelled out in more detail in a paper published today titled “ Direct speech-to-speech translation with a sequence-to-sequence model .”
The release of Translatron emerges a month after Google introduced SpecAugment , an AI model that uses computer vision and a variety of techniques to understand words from spectogram imagery.
Translatotron could be applied for things like Google Assistant’s Interpreter Mode , which made its debut for Home speakers in January. Interpreter Mode is capable of listening and providing speech-to-speech translation in 27 languages. Companies like Google and Microsoft are also using their language translate chops as a way to win over iOS users .
Translatotron is the latest advance in machine translation and language processing from Google.
Last week at Google’s I/O developer conference, Google shared that it shrunk its recurrent neural networks and language understanding models for on-device machine learning with smartphones, making Google Assistant up to 10 times faster . Google also introduced translations with Lens so your camera can translate more than 100 languages .
- States are leaning toward a push to break up Google's ad tech business
- WhatsApp Phone Numbers Pop Up in Google Search Results — But is it a Bug?
- UK's COVID-19 health data contracts with Google and Palantir finally emerge
- Latest Google Pixel 4a leak hints at wireless charging support
- 谷歌佩奇支持的Kitty Hawk中止Flyer飞车计划 将专注打造Heavis