Many valuable data treasures are hidden in media archives. Extracting these treasures is still difficult—manually annotating text, images, and audio and video files with content-related metadata is very laborious, time consuming and can only be created for a limited amount of media files.
Now, however, the Fraunhofer IAIS Mining Platform is capable to analyze almost unlimited quantities of multimedia documents with high precision and flexibility. The automatically generated metadata makes valuable content much more accessible to journalists and other media creators, as well as allowing searching by topic.
A modular and extensible system
The Mining Platform has a modular structure. The system allows almost any metadata extraction processes to be integrated. Components for named entity recognition, keyword extraction, topic modeling, and smart keyword assignment are already integrated for analyzing text documents. Spoken material can be automatically transcribed with the high performance Fraunhofer IAIS AudioMining solution. Furthermore, visual related metadata such as faces can be detected in images and videos.
The Fraunhofer team is currently developing and integrating additional analysis services, such as for detecting cuts and scenes, key frame extraction, and object recognition. Processes developed by third-party providers can also be integrated easily on request. It is also possible to train customer-specific models for the existing analysis services, thus replacing the ready-made models—an advantage not offered by other systems so far.
Step-by-step analyses thanks to workflow components
The individual steps involved in analyzing a media file are controlled by a workflow component. This has several benefits: the architecture allows failed analysis steps to be repeated and also means that the analyses to be carried out can be prioritized. This is helpful when certain media data has priority over other data in the analysis, for example in the case of live events. The workflow component can also be used to model more complex processing sequences, for example when the output of an analysis service serves as an input for subsequent services.
For example, a video can be analyzed using facial recognition, while audio mining can transcribe the speech captured on the audio track and even recognize the separate speakers. The metadata—the people recognized in the image, and the speakers recognized—can then be linked together. The system can, say, detect when a person is visible on screen while also talking. In another step downstream, the transcript can be further analyzed using text mining processes. This means that not only can the system determine when which person can be seen or heard in a video, but also what topics, people, places, or institutions are mentioned.
To allow the Mining Platform to scale according to the quantity of media data to be processed, it uses a microservices-based architecture to operate it in a Kubernetes cluster. Thus, the number of instances of the individual analysis services can be adapted to required workload.
Running the mining platform in a Kubernetes cluster also has the advantage that it can be operated in a private cloud or directly on premise. Digital sovereignty is guaranteed at all times. Connection via REST interfaces means that the Mining Platform can be easily integrated into other broadcast systems.
Cross-media searching: the Mining Platform successfully developed with and deployed by ARD
The Mining Platform is being developed in a strategic partnership with Germany’s public broadcasting network, ARD, where it is being used for applications such as a cross-media search system.