Enhancing Cross-lingual Semantic Annotations using Deep Network Sentence Embeddings

Ying-Chi Lin, Phillip Hoffmann, Erhard Rahm

2021

Abstract

Annotating documents using concepts of ontologies enhances data quality and interoperability. Such semantic annotations also facilitate the comparison of multiple studies and even cross-lingual results. The FDA therefore requires that all submitted medical forms have to be annotated. In this work we aim at annotating medical forms in German. These standardized forms are used in health care practice and biomedical research and are translated/adapted to various languages. We focus on annotations that cover the whole question in the form as required by the FDA. We need to map these non-English questions to English concepts as many of these concepts do not exist in other languages. Due to the process of translation and adaptation, the corresponding non-English forms deviate from the original forms syntactically. This causes the conventional string matching methods to produce low annotation quality results. Consequently, we propose a new approach that incorporates semantics into the mapping procedure. By utilizing sentence embeddings generated by deep networks in the cross-lingual annotation process, we achieve a recall of 84.62%. This is an improvement of 134% compared to conventional string matching. Likewise, we also achieve an improvement of 51% in precision and 65% in F-measure.

Download


Paper Citation


in Harvard Style

Lin Y., Hoffmann P. and Rahm E. (2021). Enhancing Cross-lingual Semantic Annotations using Deep Network Sentence Embeddings. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF; ISBN 978-989-758-490-9, SciTePress, pages 188-199. DOI: 10.5220/0010256801880199


in Bibtex Style

@conference{healthinf21,
author={Ying-Chi Lin and Phillip Hoffmann and Erhard Rahm},
title={Enhancing Cross-lingual Semantic Annotations using Deep Network Sentence Embeddings},
booktitle={Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF},
year={2021},
pages={188-199},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010256801880199},
isbn={978-989-758-490-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF
TI - Enhancing Cross-lingual Semantic Annotations using Deep Network Sentence Embeddings
SN - 978-989-758-490-9
AU - Lin Y.
AU - Hoffmann P.
AU - Rahm E.
PY - 2021
SP - 188
EP - 199
DO - 10.5220/0010256801880199
PB - SciTePress