loading
Papers

Research.Publish.Connect.

Paper

Authors: Aigerim Mussina 1 ; Sanzhar Aubakirov 1 and Paulo Trigo 2

Affiliations: 1 Department of Computer Science, Al-Farabi Kazakh National University, Almaty and Kazakhstan ; 2 Instituto Superior de Engenharia de Lisboa, Biosystems and Integrative Sciences Institute / Agent and Systems Modeling, Lisbon and Portugal

ISBN: 978-989-758-318-6

Keyword(s): Summarization, Automatic Extraction, Key-words, N-gram, TextRank.

Related Ontology Subjects/Areas/Topics: Business Analytics ; Data Engineering ; Data Management and Quality ; Text Analytics

Abstract: This paper presents a comparative perspective in the field of automatic text summarization algorithms. The main contribution is the implementation of well-known algorithms and the comparison of different summarization techniques on corpora of news articles parsed from the web. The work compares three summarization techniques based on TextRank algorithm, namely: General TextRank, BM25, LongestCommonSubstring. For experiments, we used corpora based on news articles written in Russian and Kazakh. We implemented and experimented well-known algorithms, but we evaluated them differently from previous work in summary evaluation. In this research, we propose a summary evaluation method based on keywords extracted from the corpora. We describe the application of statistical information, show results of summarization processes and provide their comparison.

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.233.224.8

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Mussina, A.; Aubakirov, S. and Trigo, P. (2018). Automatic Document Summarization based on Statistical Information.In Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-318-6, pages 71-76. DOI: 10.5220/0006888400710076

@conference{data18,
author={Aigerim Mussina. and Sanzhar Aubakirov. and Paulo Trigo.},
title={Automatic Document Summarization based on Statistical Information},
booktitle={Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2018},
pages={71-76},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006888400710076},
isbn={978-989-758-318-6},
}

TY - CONF

JO - Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - Automatic Document Summarization based on Statistical Information
SN - 978-989-758-318-6
AU - Mussina, A.
AU - Aubakirov, S.
AU - Trigo, P.
PY - 2018
SP - 71
EP - 76
DO - 10.5220/0006888400710076

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.