Video Summarization through Total Variation, Deep Semi-supervised Autoencoder and Clustering Algorithms

Eden Pereira da Silva, Eliaquim Ramos, Leandro Tavares da Silva, Jaime Cardoso, Gilson Giraldi


Video summarization is an important tool considering the amount of data to analyze. Techniques in this area aim to yield synthetic and useful visual abstraction of the videos contents. Hence, in this paper we present a new summarization algorithm, based on image features, which is composed by the following steps: (i) Query video processing using cosine similarity metric and total variation smoothing to identify classes in the query sequence; (ii) With this result, build a labeled training set of frames; (iii) Generate the unlabeled training set composed by samples of the video database; (iv) Training a deep semi-supervised autoencoder; (v) Compute the K-means for each video separately, in the encoder space, with the number of clusters set as a percentage of the video size; (vi) Select key-frames in the K-means clusters to define the summaries. In this methodology, the query video is used to incorporate prior knowledge in the whole process through the obtained labeled data. The step (iii) aims to include unknown patterns useful for the summarization process. We evaluate the methodology using some videos from OPV video database. We compare the performance of our algorithm with the VSum. The results indicate that the pipeline was well succeed in the summarization presenting a F-score value superior to VSum.


Paper Citation