Concept Extraction with Convolutional Neural Networks

Andreas Waldis; Luca Mazzola; Michael Kaufmann

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Concept Extraction with Convolutional Neural Networks

Topics: Data Science; Datamining; Neural Network Applications; Semi-Structured and Unstructured Data; Text Analytics

In Proceedings of the 7th International Conference on Data Science, Technology and Applications DATA - Volume 1, 118-129, 2018 , Porto, Portugal

Authors: Andreas Waldis ; Luca Mazzola and Michael Kaufmann

Affiliation: Lucerne University of Applied Sciences, School of Information Technology, 6343 - Rotkreuz and Switzerland

Keyword(s): Natural Language Processing, Concept Extraction, Convolutional Neural Network.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Biomedical Engineering ; Business Analytics ; Data Engineering ; Data Management and Quality ; Data Mining ; Databases and Information Systems Integration ; Datamining ; Enterprise Information Systems ; Health Information Systems ; Semi-Structured and Unstructured Data ; Sensor Networks ; Signal Processing ; Soft Computing ; Text Analytics

Abstract: For knowledge management purposes, it would be interesting to classify and tag documents automatically based on their content. Concept extraction is one way of achieving this automatically by using statistical or semantic methods. Whereas index-based keyphrase extraction can extract relevant concepts for documents, the inverse document index grows exponentially with the number of words that candidate concpets can have. To adress this issue, the present work trains convolutional neural networks (CNNs) containing vertical and horizontal filters to learn how to decide whether an N-gram (i.e, a consecutive sequence of N characters or words) is a concept or not, from a training set with labeled examples. The classification training signal is derived from the Wikipedia corpus, knowing that an N-gram certainly represents a concept if a corresponding Wikipedia page title exists. The CNN input feature is the vector representation of each word, derived from a word embedding model; the output i s the probability of an N-gram to represent a concept. Multiple configurations for vertical and horizontal filters were analyzed and configured through a hyper-parameterization process. The results demonstrated precision of between 60 and 80 percent on average. This precision decreased drastically as N increased. However, combined with a TF-IDF based relevance ranking, the top five N-gram concepts calculated for Wikipedia articles showed a high precision of 94%, similar to part-of-speech (POS) tagging for concept recognition combined with TF-IDF, but with a much better recall for higher N. CNN seems to prefer longer sequences of N-grams as identified concepts, and can also correctly identify sequences of words normally ignored by other methods. Furthermore, in contrast to POS filtering, the CNN method does not rely on predefined rules, and could thus provide language-independent concept extraction. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.157

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Waldis, A., Mazzola, L., Kaufmann and M. (2018). Concept Extraction with Convolutional Neural Networks. In Proceedings of the 7th International Conference on Data Science, Technology and Applications - DATA; ISBN 978-989-758-318-6; ISSN 2184-285X, SciTePress, pages 118-129. DOI: 10.5220/0006901201180129

@conference{data18,
author={Andreas Waldis and Luca Mazzola and Michael Kaufmann},
title={Concept Extraction with Convolutional Neural Networks},
booktitle={Proceedings of the 7th International Conference on Data Science, Technology and Applications - DATA},
year={2018},
pages={118-129},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006901201180129},
isbn={978-989-758-318-6},
issn={2184-285X},
}

TY - CONF

JO - Proceedings of the 7th International Conference on Data Science, Technology and Applications - DATA
TI - Concept Extraction with Convolutional Neural Networks
SN - 978-989-758-318-6
IS - 2184-285X
AU - Waldis, A.
AU - Mazzola, L.
AU - Kaufmann, M.
PY - 2018
SP - 118
EP - 129
DO - 10.5220/0006901201180129
PB - SciTePress