Multi-document Arabic Text Summarization based on Thematic Annotation

Amina Merniz, Anja Chaibi, Henda Ben Ghézala

2021

Abstract

Reduce document(s) by keeping keys and significant sentences from a set of data is called text summarization. It has been around for a long time in natural language processing research, it is improving over the years due to a considerable number of methods and research in this area. The paper suggests Arabic multi-document text summarization. The originality of the approach is that the summary based on thematic annotation such as input documents are analyzed and segmented using LDA. Then segments of each topic are represented by a separate graph because of the redundancy problem in multi-document summarization. In the last step, the proposed approach applies a modified pagerank algorithm that utilizes cosine similarity measure as a weight between edges. Vertices that have high scores are essential. Therefore, they construct the final summary. To evaluate summary systems, researchers develop serval metrics divided into three categories, namely: automatic, semi-automatic and manual. This study research chooses automatic evaluation methods for text summarization, mainly Rouge measure (Rouge-1, Rouge-2, Rouge-L, and Rouge-SU4).

Download


Paper Citation


in Harvard Style

Merniz A., Chaibi A. and Ben Ghézala H. (2021). Multi-document Arabic Text Summarization based on Thematic Annotation. In Proceedings of the 16th International Conference on Software Technologies - Volume 1: ICSOFT, ISBN 978-989-758-523-4, pages 639-644. DOI: 10.5220/0010557906390644


in Bibtex Style

@conference{icsoft21,
author={Amina Merniz and Anja Chaibi and Henda Ben Ghézala},
title={Multi-document Arabic Text Summarization based on Thematic Annotation},
booktitle={Proceedings of the 16th International Conference on Software Technologies - Volume 1: ICSOFT,},
year={2021},
pages={639-644},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010557906390644},
isbn={978-989-758-523-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 16th International Conference on Software Technologies - Volume 1: ICSOFT,
TI - Multi-document Arabic Text Summarization based on Thematic Annotation
SN - 978-989-758-523-4
AU - Merniz A.
AU - Chaibi A.
AU - Ben Ghézala H.
PY - 2021
SP - 639
EP - 644
DO - 10.5220/0010557906390644