loading
Documents

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Soumia Lilia Berrahou 1 ; Patrice Buche 1 ; Juliette Dibie-Barthelemy 2 and Mathieu Roche 3

Affiliations: 1 Univ. Montpellier 2, INRA – UMR IATE and INRIA – GraphIK 2, France ; 2 INRA - Mét@risk & AgroParisTech, France ; 3 Univ. Montpellier 2, France

ISBN: 978-989-8565-75-4

Keyword(s): Information Retrieval, Unit of Measure Extraction, Ontological and Terminological Resource, MachineLearning.

Abstract: A large amount of quantitative data, related to experimental results, is reported in scientific documents in a free form of text. Each quantitative result is characterized by a numerical value often followed by a unit of measure. Extracting automatically quantitative data is a painstaking process because units suffer from different ways of writing within documents. In our paper, we propose to focus on the extraction and identification of the variant units, in order to enrich iteratively the terminological part of an Ontological and Terminological Resource (OTR) and in the end to allow the extraction of quantitative data. Focusing on unit extraction involves two main steps. Since we work on unstructured documents, units are completely drowned in textual information. In the first step, our method aims at handling the crucial time-consuming process of unit location using supervised learning methods. Once the units have been located in the text, the second step of our method consists in e xtracting and identifying candidate units in order to enrich the OTR. The extracted candidates are compared to units already known in the OTR using a new string distance measure to validate whether or not they are relevant variants. We have made concluding experiments on our two-step method on a set of more than 35000 sentences. (More)

PDF ImageFull Text

Download
Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 35.153.73.72

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Berrahou, S.; Buche, P.; Dibie-Barthelemy, J. and Roche, M. (2013). How to Extract Unit of Measure in Scientific Documents?.In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: SSTM, (IC3K 2013) ISBN 978-989-8565-75-4, pages 249-256. DOI: 10.5220/0004666302490256

@conference{sstm13,
author={Soumia Lilia Berrahou. and Patrice Buche. and Juliette Dibie{-}Barthelemy. and Mathieu Roche.},
title={How to Extract Unit of Measure in Scientific Documents?},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: SSTM, (IC3K 2013)},
year={2013},
pages={249-256},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004666302490256},
isbn={978-989-8565-75-4},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval and the International Conference on Knowledge Management and Information Sharing - Volume 1: SSTM, (IC3K 2013)
TI - How to Extract Unit of Measure in Scientific Documents?
SN - 978-989-8565-75-4
AU - Berrahou, S.
AU - Buche, P.
AU - Dibie-Barthelemy, J.
AU - Roche, M.
PY - 2013
SP - 249
EP - 256
DO - 10.5220/0004666302490256

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.