loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Aigerim Mussina 1 ; Sanzhar Aubakirov 1 and Paulo Trigo 2

Affiliations: 1 Department of Computer Science, Al-Farabi Kazakh National University, Almaty and Kazakhstan ; 2 Instituto Superior de Engenharia de Lisboa, Biosystems and Integrative Sciences Institute / Agent and Systems Modeling, Lisbon and Portugal

Keyword(s): Summarization, Automatic Extraction, Key-words, N-gram, TextRank.

Related Ontology Subjects/Areas/Topics: Business Analytics ; Data Engineering ; Data Management and Quality ; Text Analytics

Abstract: This paper presents a comparative perspective in the field of automatic text summarization algorithms. The main contribution is the implementation of well-known algorithms and the comparison of different summarization techniques on corpora of news articles parsed from the web. The work compares three summarization techniques based on TextRank algorithm, namely: General TextRank, BM25, LongestCommonSubstring. For experiments, we used corpora based on news articles written in Russian and Kazakh. We implemented and experimented well-known algorithms, but we evaluated them differently from previous work in summary evaluation. In this research, we propose a summary evaluation method based on keywords extracted from the corpora. We describe the application of statistical information, show results of summarization processes and provide their comparison.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 44.212.26.248

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Mussina, A.; Aubakirov, S. and Trigo, P. (2018). Automatic Document Summarization based on Statistical Information. In Proceedings of the 7th International Conference on Data Science, Technology and Applications - DATA; ISBN 978-989-758-318-6; ISSN 2184-285X, SciTePress, pages 71-76. DOI: 10.5220/0006888400710076

@conference{data18,
author={Aigerim Mussina. and Sanzhar Aubakirov. and Paulo Trigo.},
title={Automatic Document Summarization based on Statistical Information},
booktitle={Proceedings of the 7th International Conference on Data Science, Technology and Applications - DATA},
year={2018},
pages={71-76},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006888400710076},
isbn={978-989-758-318-6},
issn={2184-285X},
}

TY - CONF

JO - Proceedings of the 7th International Conference on Data Science, Technology and Applications - DATA
TI - Automatic Document Summarization based on Statistical Information
SN - 978-989-758-318-6
IS - 2184-285X
AU - Mussina, A.
AU - Aubakirov, S.
AU - Trigo, P.
PY - 2018
SP - 71
EP - 76
DO - 10.5220/0006888400710076
PB - SciTePress