Content Adaptation, Personalisation and Fine-grained Retrieval:

Applying AI to Support Engagement with and Reuse of Archival

Content at Scale

Rasa Bocyte

and Johan Oomen

Netherlands Institute for Sound and Vision, Hilversum, The Netherlands

Keywords: Reuse, Video Summarisation, Content Adaptation, Personalisation, Multimedia Annotation, Retrieval.

Abstract: Recent technological advances in the distribution of audiovisual content have opened up many opportunities

for media archives to fulfil their outward-facing ambitions and easily reach large audiences with their content.

This paper reports on the initial results of the ReTV research project that aims to develop novel approaches

for the reuse of audiovisual collections. It addresses the reuse of archival collections from three perspectives:

content holders (broadcasters and media archives) who want to adapt audiovisual content for distribution on

social media, end-users who have switched from linear television to online platforms to consume audiovisual

content and creatives in the media industry who seek audiovisual content that could be used in new

productions. The paper presents three uses cases that demonstrate how AI-based video analysis technologies

can facilitate these reuse scenarios through video content adaptation, personalisation and fine-grained retrieval.

1 INTRODUCTION

Audiovisual archives across the globe are looking

after massive amounts of digitised and born-digital

content. Together, they represent a vital part of our

collective memory. Archiving is vitally proactive - it

combines acts of selection and preservation with

community engagement, storytelling and production

(Kaufmann, 2018). Ever changing audience

expectations underscore the need for organisations to

define a comprehensive access strategy, including an

assessment of channels used to research out to their

users. Recent advances in the distribution of

audiovisual content have opened up many

opportunities for archives to fulfil these outward-

facing ambitions. Syndicating content over various

platforms, using artificial intelligence to extract

knowledge and tracking online use to better

understand audiences are just a few examples of

innovations that will help organisations to maximise

their impact. Designing resilient, inclusive, outward-

facing audiovisual archives requires experimentation

and collaboration between memory institutions,

multiple scientific disciplines and the creative

industries.

https://orcid.org/0000-0003-1254-8869

https://orcid.org/0000-0003-1750-6801

ReTV (https://retv-project.eu/) is a pan-European

research action that brings computer scientists,

broadcasters, interactive TV companies, audio-visual

archives from across Europe. It develops and

evaluates technology that provides deep insights into

the interactions with digital content and re-purposing

of media across channels. The resulting Trans-Vector

Platform (TVP) built by ReTV empowers content

holders (such as broadcasters and archives) to

continuously measure and predict the success of

content in terms of reach and audience engagement

across distribution channels (vectors), allowing them

to optimise decision making processes. Specifically

for archives, the TVP supports novel approaches for

reuse of their collections using video content

adaptation, personalisation and fine-grained retrieval.

This paper presents the uses cases that are central to

the work in ReTV and demonstrate how AI-based

video analysis technologies could be employed by

broadcasters and media archives to support the reuse

and repurposing of their audiovisual collections.

506

Bocyte, R. and Oomen, J.

Content Adaptation, Personalisation and Fine-grained Retrieval: Applying AI to Support Engagement with and Reuse of Archival Content at Scale.

DOI: 10.5220/0009188505060511

In Proceedings of the 12th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2020) - Volume 1, pages 506-511

ISBN: 978-989-758-395-7; ISSN: 2184-433X

2 REUSE OF AUDIOVISUAL

CONTENT: THREE

PERSPECTIVES

The emergence of digital technologies has radically

changed the way how audiovisual content is being

created, distributed and consumed. Video-on-

Demand (VOD), Web-TV and social media platforms

are increasingly becoming the primary source for

access to audiovisual content and are introducing new

forms of media consumption that are not dictated by

TV guide schedules or limited to passive

consumption from one’s living room (Mikos, 2016).

VOD platforms embrace and encourage binge-

watching behaviour where audiences spend

prolonged time watching consecutive episodes of a

television programme of their choice and

continuously receive recommendations for more

similar content (Jenner, 2017). Social media

platforms are characterized by mechanisms for liking,

sharing, contributing and (co-)creating that are meant

to stimulate high-level of interaction with content and

with other users (Dolan, Conduit, Fahy, Goodman,

2016). These new consumer behaviours demand

audiovisual content that caters to niche audiences

defined by their individual interests and viewing

habits.

This changing media landscape creates a market

opportunity for audiovisual archives. Audiovisual

archives look after a wealth of content that is

primarily used by professional media professionals

and researchers but it is not easily accessible to the

general public. While many cultural organisations

create online portals for their online collections

(EUscreen, Europeana), additional outward-facing

efforts are needed to ensure that this content reaches

audiences who might not know about its existence.

By capitalising on their digitised and born-digital

collections and making them more accessible,

cultural organisations can fulfill consumer demand

for content online and bring unprecedented visibility

to their audiovisual holdings.

ReTV is developing three use cases that showcase

possibilities for the reuse of archival collections in

digital contexts from three perspectives:

 video Content Adaptation and Publication on

Social Media for content owners (broadcasters

and media archives) who want to distribute their

content with high impact;

 Personalised Video Delivery for end-users who

have switched from linear television to online

platforms and personal devices to consume

audiovisual content that fits their personal

interests;

 Generous Interfaces for Retrieval of Video

Segments for creatives in the media industry

who seek audiovisual content that could be

reused in new productions.

2.1 Concept: ReTV Trans-Vector

Platform

To support various uses for content repurposing and

reuse, ReTV is building a system of modular

technical components for ingestion, analysis and

enrichment of audiovisual content and various data

sources. All the components communicate via APIs

and can be used in different configurations to support

various use cases and build different frontend

applications for content repurposing. The backbone

of TVP is a module for AI-based video analysis that

includes segmentation of video content into scenes,

shots and subshots, annotation with semantic

concepts, brand detection and video summarisation.

These modules are supported by other TVP

components, including crawling of online news

sources (websites, social media), metadata analysis,

temporal annotation, and content recommendation

and scheduling services. Together, they ensure that

repurposed archival content reaches audiences on

appropriate channels at optimal time.

Figure 1: TVP architecture as a modular design of

microservices. It indicates the division of tasks between the

research partners building the TVP: MOD (Modul

University), WLT (WebLyzard), CERTH (Centre for

Research and Technology Hellas) and Genistat.

Content Adaptation, Personalisation and Fine-grained Retrieval: Applying AI to Support Engagement with and Reuse of Archival Content at

Scale

507

This paper will focus on the AI-based video

analysis components within the TVP that facilitate

archival content reuse in the three ReTV use cases.

3 USE CASES

The following section outlines three uses cases that

make use of AI-based video analysis technologies to

repurpose archival audiovisual content - video

content adaptation, personalised video content

delivery and retrieval of video segments through

generous interfaces. Together, they highlight how

recent advances in AI can be leveraged by

audiovisual archives to support new modes of reuse.

3.1 Video Content Adaptation for

Online Publication

The consumption of linear television broadcasting is

radically different from the way television content is

consumed on social media. Instead of watching full-

length programmes, audiences on social media

platforms are used to watching short, often muted

videos with intertitles, that allow them to quickly

survey the essence of the story. This dictates that the

format of archival broadcaster content needs to be

adapted before publication online if it is to have high

impact on online audiences. Since different social

media channels have different requirements in terms

of optimal length and format, such adaptation

currently requires good understanding of the content

and lengthy manual editing process to select video

segments that attract viewer attention.

For this purpose, ReTV developed a tool that

automatically summarises full-length videos into

short clips that convey the narrative of the entire

video. The tool is built around video analysis and

summarisation services that shorten a video into a

selection of shots that portray key moments in the

story (Apostolidis, Metsai, Adamantidou, Mezaris,

Patras, 2019). Since summarisation does not take into

account audio elements (the selected shots might be

cut in the middle of the sentence), the summarised

videos are muted. The tool also provides creative

editing functions, allowing users to add overlaying

images, text, audio or subtitle track as well as edit the

sequence of shots in the video.

To perform the first round of evaluations with

professionals from media archives and broadcasting

organisations, the tool was tested with a selected

number of archival newsreel content from the

Netherlands Institute for Sound and Vision collection

(accessible via https://openbeelden.nl/.en). All videos

were between 2-5 minutes long and were summarised

into 20-30 second clips.

Users indicated that automatic summarisation

would significantly reduce the efforts needed to

manually edit videos before publishing them online

and would encourage their organisations to share

more content on social media. The evaluation results

imply that although the summarised videos accurately

conveyed the narrative of the original video, the loss

of audio track was seen as a negative trade-off.

Testers suggested that audio analysis could be

performed to provide suggestions for overlaying text

and subtitles as well as descriptions accompanying

videos. This is particularly pertinent in cases where

subtitles are not available. Testers also expressed that

they would like to manually control certain editor

parameters that would determine the outcomes of

video summarisation (e.g. adjust the length of shots

in the summary).

Our future work will focus on further adapting

video summaries to suit various content genres,

publication channels and various audiences, e.g.

creating different length summaries for different

social media platforms, making different versions of

the same video that target different audiences,

perform summarisation for multiple videos.

Additionally, ReTV will introduce additional

components for audio analysis and text editing that

would complement visual analysis services. To

further evaluate the quality of video summarisations,

ReTV will perform tests with consumers and monitor

their engagement with summarised content.

3.2 Personalised Video Delivery

Further building on the idea that AI can adapt

broadcaster collections to different online publication

channels, the second use case explores how

audiovisual content could be customised to a single

person. The concept of personalising user experience

is already established in the broadcasting and media

industries - over-the-top (OTT) platforms like

Netflix, video-on-demand and streaming platforms

like YouTube have adopted systems that track

viewing patterns and match them with available

content to make content recommendations for each

individual user, in this way keeping users engaged for

prolonged time and returning to consume more

content (Covington, Adams, Sargin, 2016; Lund, Ng,

2018). It is harder for media archives to achieve the

same effect since their online collection portals are

less concerned with entertainment and more with

presenting digital collections in a contextualised,

informative and educational form. Therefore, to

ARTIDIGH 2020 - Special Session on Artiﬁcial Intelligence and Digital Heritage: Challenges and Opportunities

508

benefit from personalisation, media archives need

alternative channels to distribute their content.

This use case investigates how archives can

personalise their audiovisual collections using AI-

driven video analysis and distribute content to

audiences using digital communication channels. The

fine-grained video analysis creates affordances for a

more precise matching between user’s profile and

video content - retrieving relevant content for the user

based on scene- and shot-level analysis rather than

recommending content based on semantic

relationships between video titles. This opens

possibilities to match individual segments of archival

content to particular users. Additionally, AI-driven

video summarisation can be employed to give users

an initial overview of recommended content and help

them decide whether they are interested in watching

it.

To determine what kind of personalised content

users would like to receive from a media archive and

on what criteria such personalisation should be based,

ReTV developed a chatbot that delivers archival

video content. Chatbot was selected as the most

appropriate channel for distribution since it allows us

to collect accurate data for building personal profiles.

A mixture of implicit and explicit feedback is seen as

the most appropriate method to set up user profiles,

i.e. monitoring viewing habits as well as asking

explicit questions about personal preferences

(Hobson and Kompatsiaris, 2006). The bi-directional

communication with users through the chatbot allows

us to obtain immediate implicit feedback as well as

monitor viewing patterns.

Once users sign up to use the chatbot, they are

immediately directed to set up their profile where

they can choose topics they are interested in. For this

first iteration of the ReTV chatbot, a small set of

videos from the Netherlands Institute for Sound and

Vision collection was selected and grouped under five

high-level categories - Sports, Animals, Fashion,

Dutch Life and Culture, History. These high-level

categories are similar to what users would find when

browsing through archival collections online; here,

they serve as a starting point for understanding what

kind of categorisation - if any - would support content

personalisation for individual users.

The chatbot delivers content that corresponds to

the user’s selected topics on a regular basis. First,

users receive a 1-2 second GIF preview of a video

which is created using the video summarisation

service - here the GIF summary serves as a teaser to

enable users to choose whether they are interested in

the content. Next, users can choose to watch the

extended summary, see the full-length video or go to

see the item in the online collection operated by the

archive.

Figure 2: Screenshot of the ReTV chatbot.

Results from evaluation with consumers indicated

that users are mostly interested in receiving archival

content personalised to their particular context - e.g.

news stories about their hometown, historic footage

of their favourite football team, programmes

broadcasted on this day in history. High-level

categorisation of content does not accommodate such

personalisation as user interests defy strict boundaries

and require much more fluid categorisation based on

granular annotations. This suggests that the next

iteration of the chatbot should focus on giving users

more room to indicate their specific interests, instead

of limiting their choices to a number of predefined

categories dictated by the available content. AI-

driven video analysis would then be used to identify

video segments that correspond to those interests and

compile those segments into one video. This method

would provide more accurate personalisation for the

users and would help media archives utilise media

assets that might not be relevant for larger audiences.

One of the challenges for the future research is to

find a way to personalise content in a way that allows

users to discover new things rather than consuming

content about topics they are already familiar with.

Content Adaptation, Personalisation and Fine-grained Retrieval: Applying AI to Support Engagement with and Reuse of Archival Content at

Scale

509

3.3 Generous Interfaces for Retrieval

of Video Segments

The final use cases approaches reuse of broadcaster

collections from the perspective of creatives working

in the media industry - producers, journalists,

documentary makers, amateur video content makers

creators, etc. The most common point of interaction

between these creators and archival collections is the

Media Asset Management (MAM) systems. The

interface of these systems is often closed off - the

content is hidden behind the search bar which means

that the discovery of content is highly depended on

one’s tacit knowledge. In addition, MAMs provide

access to the full item whereas creators are interested

in much more fine-grained access that allows them to

identify and retrieve specific segments of content

(Huurnink, Hollink, Van Den Heuvel, De Rijke

2010). Lack of granular metadata also means that the

search is often restrained by item titles and high-level

descriptions. To address these shortcomings, fine-

grained annotations created with AI can be employed

to facilitate the discovery of relevant content for

repurposing in new productions.

This third use case builds on the concept of

generous interfaces - harnessing data visualisations to

showcase the abundance and richness of digital

collections (Whitelaw, 2015). Many cultural heritage

organisations have adopted this for their online

collections as a way to help users navigate through a

large number of items available and find often

unexpected links between them. Equally, generous

interfaces work on an individual item level where

they provide various facets to inspect the digital

object (e.g. providing a breakdown of a colour

scheme in a painting).

While most of the current generous interfaces

work with image-based collections, the application of

this concept to time-based media would be

exceedingly beneficial as it would allow users to

retrieve, inspect and compare different segments of

the same item or a number of items at a glance.

The foundation for ReTV’s generous interface is

the visual analysis service. Video segmentation into

individual shots allows us to provide a breakdown of

all concepts detected on a shot level. It is also possible

to perform concept-based search and retrieve all shots

from a video that contain a specific concept. Such

fine-grained functions increase the possibilities of

locating more unexpected video segments, especially

for content that would not be discovered otherwise

due to the lack of descriptive metadata.

Figure 3: Interface of the visual analysis service with scene

and shot segmentation.

Figure 4: Interface of the visual analysis service showing

the retrieval of shots that depict a selected concept.

For creators using MAMs, a generous interface

based on visual analysis of time-based media could

provide significant support for conducting “image

search” - finding relevant video segments that could

be reused to illustrate narratives in their productions.

It creates affordances for a more serendipitous

content retrieval, one that can guide and inspire new

creative narratives (Sauer, 2017). In particular, it

would make archival systems more accessible to

amateur creators who do not have pre-existing

knowledge about broadcaster collections. To provide

an even more comprehensive analysis of audiovisual

content, such generous interfaces could combine a

number of AI-powered services, e.g. concept

detection together with face recognition and speech

analysis.

4 CONCLUSIONS

The initial evaluation of the three ReTV use cases

presented in this paper points to the wide-reaching

impact that reuse and distribution of archival

ARTIDIGH 2020 - Special Session on Artiﬁcial Intelligence and Digital Heritage: Challenges and Opportunities

510

collections can have on media archives and various

user groups who engage with audiovisual heritage.

ReTV is a three year project and has just past its mid-

term review. In the next period, and as the TVP is

developed further, more user testing will be

conducted. Also, active consultations are planned

with stakeholders from the heritage sector and

creative industries. Both the user testing and the

consultations inform the further development and

eventual deployment of the final system.

With automated video content adaptation and

publication on social media platforms, media archives

can highlight the value of audiovisual heritage in

contemporary contexts, specifically related to social

media. Delivery of customised videos help users to

develop more personalised and long-lasting

relationships with archival collections. And finally,

the fine-grained access to the archive through

generous interfaces welcomes professional and

amateur creators to discover the creative potential of

audiovisual heritage.

ACKNOWLEDGEMENTS

This work was supported by the EUs Horizon 2020

research and innovation programme under grant

agreement H2020-780656 ReTV.

REFERENCES

Apostolidis, E., Metsai, A., Adamantidou, E., Mezaris, V.,

& Patras, I., 2019. A Stepwise, Label-based Approach

for Improving the Adversarial Training in

Unsupervised Video Summarization. 1st Int. Workshop

on AI for Smart TV Content Production, Access and

Delivery (AI4TV'19) at ACM Multimedia 2019.

http://doi.org/10.1145/3347449.3357482

Covington, P., Adams, J., & Sargin, E., 2016. Deep Neural

Networks for YouTube Recommendations.

Proceedings of the 10th ACM Conference on

Recommender Systems, 191–198. https://doi.org/

10.1145/2959100.2959190

Dolan, R., Conduit, J., Fahy, J., & Goodman, S., 2016.

Social Media Engagement Behaviour: a Uses and

Gratifications Perspective. Journal of Strategic

Marketing, 24(3-4), 261–277. https://doi.org/10.1080

/0965254X.2015.1095222

Hobson, P., & Kompatsiaris, Y., 2006. Advances in

semantic multimedia analysis for personalised content

access. 2006 IEEE International Symposium on

Circuits and Systems (ISCAS). https://doi.org/

10.1109/ISCAS.2006.1693027

Huurnink, B., Hollink, L., Van Den Heuvel, W., & De

Rijke, M., 2010. Search behavior of media

professionals at an audiovisual archive: A transaction

log analysis. Journal of the American Society for

Information Science and Technology, 61(6), 1180–

1197. https://doi.org/10.1002/asi.21327

Jenner, M., 2017. Binge-watching: Video-on-demand,

quality TV and mainstreaming fandom. International

Journal of Cultural Studies, 20(3), 304–320. https://

doi.org/10.1177/1367877915606485

Kaufman, P. B., 2018. Towards a New Audiovisual Think

Tank for Audiovisual Archivists and Cultural Heritage

Professionals. Netherlands Institute for Sound and

Vision. https://doi:10.18146/2018thinktank01

Lund, J., & Ng, Y., 2018. Movie Recommendations Using

the Deep Learning Approach. 2018 IEEE International

Conference on Information Reuse and Integration

(IRI), 47–54. https://doi.org/10.1109/IRI.2018.00015

Mikos, L., 2016. Digital Media Platforms and the Use of

TV Content: Binge Watching and Video-on-Demand in

Germany. Media and Communication, 4(3), 154–161.

https://doi.org/10.17645/mac.v4i3.542

Sauer, S., 2017. Audiovisual Narrative Creation and

Creative Retrieval: How Searching for a Story Shapes

the Story. Journal of Science and Technology of The

Arts, 9(2), 37–46. https://doi.org/10.7559/citarj.v9i2.

241

Whitelaw, M., 2015. Generous Interfaces for Digital Cultural

Collections. Digital Humanities Quarterly, 9(1).

http://www.digitalhumanities.org/dhq/vol/9/1/000205/0

00205.html

Content Adaptation, Personalisation and Fine-grained Retrieval: Applying AI to Support Engagement with and Reuse of Archival Content at

Scale

511