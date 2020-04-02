Share it:

Google tends to explain quite frequently how you work with different machine learning models to improve the performance of your applications. He did this recently explaining how Soli's radars worked and now he explains how Google Duo's new audio quality improvements work.

It has achieved this with a new model called WaveNetEQ, a generative system based on DeepMind technology. able to complete lost packets on voice waves. Let's see how Google achieves such a feat.

This is how Google improves the audio quality in Duo through a generative model

Google says that when sending data packets over the internet, it is easy to make mistakes when receiving them.

Google says that when a call is transmitted through the internet packages have certain quality problems. These problems are due to excessive fluctuations or network delays, and up to 8% of the total content may be lost due to these.

Surely you've ever suffered canned or robotic voices in a video call. The main reason is the loss of quality in the process of sending and receiving the packages, if several are lost, the quality falls

To ensure that communication works correctly in real time, Google has created WaveNetEQ, a PLC (programmable logic controller) system trained with a generous voice database. What does this model do? The explanation is quite technical and complex, so let's summarize it in the simplest way.

The Google system analyzes the audio packages to try to predict what is meant. With this information processed in neural networks, the missing wave spectrum is completed.

WaveNetEQ is a generative model that allows you to synthesize voice waves even if parts of the wave have been lost. Surely you have ever had poor video call quality and heard a robotic or metallic sound. This is due to when the lack of packets is high (there is a lot of latency) and the sound cannot be reproduced with quality.

Through neural networks Google is able to give continuity to the signal in real time, minimizing the loss of quality. Basically, thanks to the voice database, the model "guesses" what it is intended to say and completes the wave with the missing fragments.

In blue the original audio line. In orange, the lines that Google predicts.

These improvements are beginning to be applied already in Google Duo in the Google Pixel 4, although the company assures that the model will arrive at the rest of the terminals soon, so it will be an improvement of the application, not only of the devices.

