Opera Co-creation:

Video Adaptation and Streaming Challenges

Anderson Simiscuka, Mohammed Amine Togou, Noel O’Connor and Gabriel-Miro Muntean

Dublin City University – Ireland

Immersive Player video selection list (under development), 2020. Based on the ImAc Player.[i]

One of the main aspects of the TRACTION project is to use technology for the co-creation and consumption of opera content. The project is happening at a time in which more and more people are subscribing to video streaming platforms and watching content online, due to the Coronavirus pandemic. As reported by BBC2, viewership on streaming services are 71% higher in 2020 in comparison to the previous year in the United Kingdom. It is fair to believe that the same has happened in the rest of the world with people spending more time at home.

Even though revenue for platforms such as YouTube and Netflix are on the rise, the increase in online traffic imposes challenges in relation to network issues. Both Google’s YouTube and Netflix were forced to limit3 streaming quality in order to ease pressure on network providers this year, especially during the initial lockdowns. This decrease in video quality was only possible due to an interesting feature of these platforms: adaptive streaming.

Adaptive streaming is the ability of media players to select a video stream that suits the current network condition of the user. This is possible due to the conversion of videos into various resolutions, once they have been uploaded to YouTube, for example. That is how Google and Netflix were able to select lower resolutions for videos for users (as they were already available in different encodings) and decrease the burden on networks, as streaming of lower resolution videos consumes less data than high definition ones.

One of the main technologies behind adaptive streaming is the ISO/IEC MPEG Dynamic Adaptive Streaming over HTTP, commonly known as MPEG-DASH. MPEG-DASH is a standard which has two main components: Media Presentation and Media Presentation Description (MPD). Media Presentation is a sequence of one or more segments that incorporates periods, adaptation sets, and representations, which break up the media from start to finish. The MPD is a document defined using the eXtensible Markup Language (XML), and identifies the various content components and location of all alternative segments, providing the relationship between them.

Why are adaptive streaming and MPEG-DASH important to TRACTION? TRACTION requires the development of collaborative multimedia players which will be used by communities in various locations, including areas with limited Internet bandwidth. It is also important that content from multiple locations can be processed and, when needed, merged, without latency or delays.

Multi-source multimedia players must support, for instance, the streaming of multiple pre-recorded recordings and live content from artists playing different instruments, merging the videos into one single experience, even when content is located in different locations. Other video elements that can be played simultaneously with a video stream include user feedback in video, commentators, and sign language interpreters.

Adaptive multi-source delivery systems consider user feedback, among other metrics, in order to adapt the content being delivered from multiple servers in the cloud. A receiver buffer helps the player synchronise the content, and the challenge is to present the content in similar quality (e.g. in a multi-window player as it is not desirable to have one window with poor quality and another one with good quality). Adaptation can also be done via the servers; considering the feedback of the user, type of device consuming the data and network conditions.

Another important feature of TRACTION is support for immersive experiences. These include videos recorded in 360° and virtual reality (VR) applications. Content delivery of 360° videos brings additional challenges to research and development, as files are large and VR headsets require very little streaming latency to prevent user dizziness.

The proposal of adaptation schemes specifically designed for improving the delivery of immersive video is one of the research directions within the project. Image quality, especially in the context of opera content, can be adjusted for improved user experiences. Adaptation based on resolution and region of interest can improve the quality of the video. Region of interest-based adaptive schemes, for instance, performs adaptation at the level of regions within clip frames, based on user interest obtained from tracking eye movement. The scheme adjusts the quality of the regions from the multimedia frames that the viewer is least interested in, if necessary, due to network conditions. Regions that the viewers are most interested in, either do not change, or involve little adjustment, resulting in high overall end-user perceived quality.

360° VR videos and the underlying 3D geometry can also be divided into spatially partitioned segments/tiles in the 3D space, and be adapted with more or less priority, according to the regions the users are more likely to look at. Colour can also be improved: for instance, 360° content recorded in theatres also films the audience, a region that is normally dark where the quality can be decreased as users are not interested in seeing it. The overall video size is then decreased, improving performance. On the other hand, colour quality of the stage can be improved with increased brightness and contrast. These techniques can be incorporated into the technologies used in TRACTION, such as the player under development shown in the images below, in order to increase perceived user quality.

Screenshot of a 360° video in medium and high resolutions. Footage from This Hostel Life by the Irish National Opera, Dublin, 2019. Provided by Virtual Reality Ireland.

The design of applications and solutions that support a variety of devices requires the development of novel adaptive algorithms and schemes based on device, user, and network requirements. TRACTION is a great opportunity for creating and implementing these algorithms and schemes, which must allow multiple concurrent users located in areas with limited Internet bandwidth and with a variety of devices to access and produce content at higher quality, even in constrained environments, during these challenging times.

Acknowledgements: This work was supported by the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement no. 870610 for the TRACTION project. The support of the Science Foundation Ireland (SFI) Research Centres Programme Grant Numbers 12/RC/2289_P2 (Insight) and 16/SP/3804 (ENABLE) is also acknowledged.

1 https://www.imac-project.eu/

2 https://www.bbc.com/news/entertainment-arts-53637305

3 https://www.siliconrepublic.com/comms/youtube-stream-quality-march-2020-covid19