Video-based classes and seminars are still a far cry from the real deal. With novel technology developed by Fraunhofer HHI, professors will soon be transported directly into dorm rooms – virtually, that is. And they will even be able to interact with students.
So, the professor will be standing in the middle of the student’s room teaching the class? It may sound like an exclusive private tutoring session, but this technique stands to become a reality for students throughout the entire semester. It is all possible thanks to technology developed by the Fraunhofer Heinrich Hertz Institute, or Fraunhofer HHI. “The student wears a mixed reality headset,” explains Dr. Cornelius Hellge, head of group at Fraunhofer HHI. “That way, they see both their room and the professor, who appears to be standing in the middle of the space, looking at the student and explaining the material.” To bring this vision to life, HHI is working on novel techniques for the generation and streaming of interactive virtual humans
Interactive virtual humans
The first step creating this type of visualization is to produce a volumetric recording of the professor. “To do this, we use 32 cameras arranged in stereo pairs, with 20 centimeters distance between the cameras of each pair,” explains Ingo Feldmann, head of group at Fraunhofer HHI. These stereo pairs are distributed evenly throughout the recording space and the recordings are processed into 3-dimensional meshes, creating a volumetric video stream. New hybrid animation techniques, which combine elements from classical computer graphics as well as AI-based methods allow animation of the volumetric representation of the person. Students can not only walk around the professor; they can also maintain “eye contact” at all times. “To enable this additional level of immersion, the volumetric data is dynamically animated to have the professor’s gaze follow the student,” explains Wieland Morgenstern, researcher in the Computer Vision & Graphics group at HHI. New AI-based animation techniques will also allow for the animation of gestures and facial expressions and speech that were not part of the original recording. In the coming months, the avatar’s lips will be animated as well to match the contents of the text, and within the next year, animation of the entire body could follow, to have the teacher dynamically moving around in the space.
Rendering in the edge cloud
Achieving such a feat of animation requires considerable computing power – more than a smartphone can handle – so the rendering process takes place in the edge cloud, a computer located near the end user. And for the data to be transmitted to the headset, it must first be compressed. The raw data is far too large to stream – several gigabytes to be exact. “We render the volumetric data in the cloud and send it to the headset via 5G as a 2D video stream with standard formats and bitrates,” says Dr. Hellge. Simultaneously, the headset sends data about the direction of the user’s gaze back to the edge cloud, which positions the render camera based on the movement of the user and moves the head of the avatar to match. Of course, this process has to be fast. “It’s a whole new ballpark. Other streaming services are still working with latencies of around a second. We need less than 60ms,” clarifies Dr. Hellge. This technological leap was made possible by combining the Web Real-Time Communications standard, or WebRTC, which is normally used for video conferences, with an extremely fast video encoder and optimizing all components of the streaming chain. Finally, the mixed reality headset ensures that the 2D image is perfectly integrated in its 3D setting, so that the professor is not seen holding a lecture while standing in the middle of an armchair.