TV and Radio Archives Searched in a Flash. Need to find a specific original quote in the radio or television archive? Until now, this has been difficult. A smart Fraunhofer speech technology transcribes every broadcast and delivers the desired broadcasts and time codes in mere seconds.
Where and when did Joe Biden utter his now-famous words “Build Back Better”? Finding these and similar original recordings from radio or video is a tedious business for journalists and editors: it is estimated that only ten percent of broadcasters‘ archive material includes detailed, manually inserted annotations, and only if the archivist considered them important at the time. In the case of all other materials, one has only the title information, which permits little conclusion on the specific contents.
Find the right original sound clip in just one click ...
The Fraunhofer IAIS Audio Mining system makes this procedure much easier. “If a radio or television recording is archived, our tool uses deep learning to transcribe any spoken language into a text,” explains Dr. Christoph Schmidt, Head of Business Unit “Speech Technologies” at Fraunhofer IAIS. “Every broadcast is thus available as a text file in which individual search terms can be found in fractions of a second. For each word, the time markers are also stored in the broadcast – so you can mark the desired position in the text and cut out the audio snippet you are looking for.”
For the editors this means that if, for example, they are searching for the original sound clip of Joe Biden’s statement, they can enter the right words in the search field of the user screen and receive a list of all broadcasts including the exact time in which this excerpt can be heard. The system automatically segments broadcasts according to the speaking persons; the researchers call this “speaker clustering.” Here, the persons speaking in a broadcast are numbered consecutively – if you have listened briefly to which speaker belongs to which digit, you can choose only to listen to the answers of the person being interviewed, for example.
With the speaker recognition function, the scientists even go one step further: the system recognizes the exact speaker, for example a certain politician. It is therefore able to answer more complex queries such as: “Joe Biden´s statements on the Coronavirus disease” – or can jump to the contribution of a specific person in a talkshow with a click. This involves teaching the computer how different people sound. With speech snippets of one to two seconds this is still quite difficult, but given a talk time of 30 seconds, speaker recognition already works very reliably.“
This is a great way to search through huge archives,” says Schmidt. A second usage scenario involves interviews or other recordings that are directly transcribed live as this facilitates the production of programs. The Fraunhofer IAIS live recognition system is already in use at the Saxon state parliament; other state parliaments in Germany and Austria have already shown interest.
The usage in regional parliaments has also shown that the system is robust against dialectal language. In the medium term, it is also conceivable that the tool could be used in real time for automatic subtitling of television or video programs. While this already works quite well for the news, for speakers with strong dialects or accents, or for rare technical terms – such as those used in astrophysics – some research work is necessary to reliably recognize all words. Already in use at ARD and ZDF The Fraunhofer IAIS Audio Mining tool analyzes 2000 hours of audio and video material daily for Germany’s ARD public broadcasting stations. It is also in use at the broadcaster ZDF. The system is currently mainly used in the context of archives. “It is conceivable that, in the medium term, stations will not only use it for their archives, but also for their media libraries and automatic subtitling and for working with raw material in the editorial offices,” Schmidt explains.