CONCORDIA: COmputing semaNtic sentenCes for fRench Clinical Documents sImilArity

Khadim Dramé, Khadim Dramé, Gorgoumack Sambe, Gorgoumack Sambe, Gayo Diallo

2021

Abstract

Detecting similar sentences or paragraphs is a key issue when dealing with texts duplication. This is particularly the case for instance in the clinical domain for identifying the same multi-occurring events. Due to lack of resources, this task is a key challenge for French clinical documents. In this paper, we introduce CONCORDIA, a semantic similarity computing approach between sentences within French clinical texts based on supervised machine learning algorithms. After briefly reviewing various semantic textual similarity measures reported in the literature, we describe the approach, which relies on Random Forest, Multilayer Perceptron and Linear Regression algorithms to build supervised models. These models are thereafter used to determine the degree of semantic similarity between clinical sentences. CONCORDIA is evaluated using the Spearman correlation and EDRM classical evaluation metrics on standard benchmarks provided in the context of the Text Mining DEFT 2020 challenge based. According to the official DEFT 2020 challenge results, the CONCORDIA Multilayer Perceptron based algorithm achieves the best performances compared to all the other participating systems, reaching an EDRM of 0.8217.

Download


Paper Citation


in Harvard Style

Dramé K., Sambe G. and Diallo G. (2021). CONCORDIA: COmputing semaNtic sentenCes for fRench Clinical Documents sImilArity. In Proceedings of the 17th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-536-4, pages 77-83. DOI: 10.5220/0010687500003058


in Bibtex Style

@conference{webist21,
author={Khadim Dramé and Gorgoumack Sambe and Gayo Diallo},
title={CONCORDIA: COmputing semaNtic sentenCes for fRench Clinical Documents sImilArity},
booktitle={Proceedings of the 17th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2021},
pages={77-83},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010687500003058},
isbn={978-989-758-536-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - CONCORDIA: COmputing semaNtic sentenCes for fRench Clinical Documents sImilArity
SN - 978-989-758-536-4
AU - Dramé K.
AU - Sambe G.
AU - Diallo G.
PY - 2021
SP - 77
EP - 83
DO - 10.5220/0010687500003058