Sentence Compression on Domains with Restricted Labeled Data Availability

Felipe Soares, Ticiana Coelho da Silva, Jose F. de Macêdo

2020

Abstract

The majority amount of information available on the Web remains unstructured, i.e., text documents from articles, news, blog posts, product reviews, forums discussions, among others. Given the huge amount of textual content continuously produced on the Web, it has been challenging for users to read and consume every document. Text summarization refers to the technique of shortening long pieces of text. The intention is to create a coherent and fluent summary having only the main points outlined in the document. Sentence compression can improve text summarization by removing redundant information, preserving the grammaticality and the important content of the original sentences. In this paper, we propose a sentence compression neural network model that achieved promising results compared to other neural network-based models, even when trained with smaller amounts of data. Rather than training the model only with the words from the training set, the proposed model was trained with different features extracted from the texts. This improves the ability of the model to decide whether or not to retain each word in the compressed sentence.

Download


Paper Citation


in Harvard Style

Soares F., Coelho da Silva T. and F. de Macêdo J. (2020). Sentence Compression on Domains with Restricted Labeled Data Availability. In Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-395-7, pages 130-140. DOI: 10.5220/0008958301300140


in Bibtex Style

@conference{icaart20,
author={Felipe Soares and Ticiana Coelho da Silva and Jose F. de Macêdo},
title={Sentence Compression on Domains with Restricted Labeled Data Availability},
booktitle={Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2020},
pages={130-140},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008958301300140},
isbn={978-989-758-395-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Sentence Compression on Domains with Restricted Labeled Data Availability
SN - 978-989-758-395-7
AU - Soares F.
AU - Coelho da Silva T.
AU - F. de Macêdo J.
PY - 2020
SP - 130
EP - 140
DO - 10.5220/0008958301300140