Understanding TV dialogue better – with MPEG-H Dialog+ Technology

© Fraunhofer IIS

Most T V stations are quite used to their audience complaining about hard-to-understand dialogue – be it in films, documentaries, sports coverage, and even the news. The matter is not an easy one to solve. Firstly, because the loudness difference between background sound and dialogue is a unique decision made by creators for every piece of content, and secondly, because the “perfect” dialogue loudness is a very personal issue.

The evolution of AI-based technologies and object-based audio (OBA), however, has enabled the creation of technologies such as MPEG-H Dialog+ by Fraunhofer IIS. The technology uses Deep Neural Networks (DNN) to automatically identify the dialogue of existing content, separate it from the background sounds, and remix it with a lowered background level. Using OBA, users can even adapt the dialogue level on their device to meet their personal requirements.

Recently, Fraunhofer IIS joined forces with German public broadcaster WDR and Telos Alliance to develop a professional workflow and bring MPEG-H Dialog+ into use. Fraunhofer IIS conducted field tests over DVB and the VoD platform “ARD Mediathek” to refine requirements and production workflows. The results were then fed into the product development of the Telos Alliance Minnetonka AudioTools Server Dialog+ module. The software has now been implemented as part of an automatic workflow – from archive to transcoding farm – in the WDR production infrastructure.

MPEG-H Dialog+ contains a deep neural network that performs dialogue separation. Most training data is real-world broadcast content, mostly provided by WDR and other ARD broadcasters. Dialog+ combines dialogue separation with a unique automatic remixing algorithm, where a global and a time-varying background attenuation can be combined. Global background attenuation lowers the relative level of the estimated background component by the same specified amount over the entire signal. This can be beneficial for users that prefer to always lower the background signal. For others, this might not be the optimal solution, as attenuating the background while the dialogue is not active does not improve speech intelligibility while potentially damaging mood, atmosphere, and sounds of narrative importance. A solution is to lower the background level only when the dialogue signal is active and only as much as is necessary to reach the desired level.

Thanks to the implementation of MPEG-H Dialog+, which is called “Klare Sprache” in the ARD Mediathek, the VoD platform now provides a higher degree of accessibility. Additional benefits of the automated audio processing workflows include:

  • Automated, cost saving, and scalable workflow approach
  • State-of-the-art quality of the dialogue separation algorithm
  • Dynamic remixing algorithm which only affects the background level when dialogue is present. This prevents unwanted changes to the mix and helps to preserve the artistic intent as much as possible.
  • Set of presets customized for different use cases. This way, the content provider can apply processing optimized, for example, for documentaries, music films, and sports content.

MPEG-H Dialog+ is part of the MPEG-H Audio production software, providing all features of an OBA system like advanced user interactivity and personalization. This makes the use of MPEG-H Dialog+ a future-proof decision for broadcasters and content producers as MPEG-H Audio is one of the most advanced Next Generation Audio systems on the market. It has already been chosen as a TV audio standard by countries such as Brazil and South Korea.




Mandy Garcia

Fraunhofer IIS

Back to the Trendbrochure overview