MPEG-H Audio

© Fraunhofer IIS

MPEG-H Audio: Personalized and Immersive Sound for Broadcast and Streaming

The MPEG-H Audio System enables immersive and interactive sound that can be personalized. Listen to a 3D audio experience that perfectly matches your preferences. With MPEG-H Audio, you can choose from presets or create your own settings. For example, it is now possible to switch between languages, and adjust the volume of a sports commentator. It also provides enhanced accessibility features such as dialogue enhancement and audio description. These features enhance the pristine 3D sound and flexibility that consumers expect from quality services today.

With MPEG-H Audio, next-generation audio is already a reality, as the system has been on the air 24/7 in South Korea since the UHDTV system was launched there in May 2017. In addition to that, is has recently been selected as the only mandatory audio codec of Brazil’s new TV ecosystem, TV 3.0.

The system is also used for music streaming, as Sony's immersive 360 Reality Audio format is based on MPEG-H Audio. The immersive music tracks, available from providers such as Amazon Music, Deezer, and TIDAL, can be played on mobile devices (with headphones), soundbars, smart speakers, and in the car.


Understanding TV dialogue better – with MPEG-H Dialog+ technology

Most TV stations are quite used to their audience complaining about hard-to-understand dialogue – be it in films, documentaries, sports coverage, and even the news. The matter is not an easy one to solve. Firstly, because the loudness difference between background sound and dialogue is a unique decision made by creators for every piece of content, and secondly, because the “perfect” dialogue loudness is a very personal issue.

The evolution of AI-based technologies and object-based audio (OBA), however, has enabled the creation of technologies such as MPEG-H Dialog+ by Fraunhofer IIS. The technology uses Deep Neural Networks (DNN) to automatically identify the dialogue of existing content, separate it from the background sounds, and remix it with a lowered background level. Using OBA, users can even adapt the dialogue level on their device to meet their personal requirements.

MPEG-H Dialog+ contains a deep neural network that performs dialogue separation using training data derived mostly from real-world broadcast content. Dialog+ combines dialogue separation with a unique automatic remixing algorithm, where a global and a time-varying background attenuation can be combined.

MPEG-H Dialog+ is part of the MPEG-H Audio production software, providing all features of an OBA system like advanced user interactivity and personalization. This makes the use of MPEG-H Dialog+ a future-proof decision for broadcasters and content producers as MPEG-H Audio is one of the most advanced Next Generation Audio systems on the market. It has already been chosen as a TV audio standard by countries such as Brazil and South Korea.

Angela Raguse

Communications Digital Media

Fraunhofer Business Area Digital Media
Am Wolfsmantel 33
91058 Erlangen, Germany

Phone +49 9131 776-5105

Send email