loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Totok Suhardijanto ; Zahroh Nuriah and Setiawati Darmojuwono

Affiliation: Universitas Indonesia, Indonesia

Keyword(s): corpus, compound, compositionality, machine annotation.

Abstract: This paper presents our research progress in building an automatic recognition system for compound words in Bahasa Indonesia. Our goal is to develop a system that is able to distinguish significant multiword expressions and other insignificant groups of words. For instance, rumah tangga ‘household’ should be considered as a significant cluster of words rather than rumah kayu ‘wooden house’. It is not easy to differentiate a compound word and an ordinary phrase in Bahasa Indonesia because there are no specific phonological markers like accent in German or Dutch. The orthographical markers are not always present, rumah tangga is written with a space while kacamata ‘glasses’ not. In this paper, we compare and analyze the results of machine and human annotation. The automatic annotation system is built with a statistical machine learning algorithm called conditional random field. Data for annotation task is collected from newspaper and magazine articles. In this analysis, the mixed metho d was applied to reveal the differences between human and machine annotation. The result showed that the machine still performed 69% of accuracy and had several error patterns in compound word recognition tasks. Human annotation is trivial due to personal annotator backgrounds. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.81.221.121

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Suhardijanto, T.; Nuriah, Z. and Darmojuwono, S. (2018). Teaching Machines to Recognize Idiomatic Expressions - A Comparative Analysis of Compound Word Recognition Results between Human and Machine Annotation. In The Tenth Conference on Applied Linguistics and The Second English Language Teaching and Technology Conference in collaboration with The First International Conference on Language, Literature, Culture, and Education - CONAPLIN and ICOLLITE; ISBN 978-989-758-332-2; ISSN 2184-3376, SciTePress, pages 376-380. DOI: 10.5220/0007167603760380

@conference{conaplin and icollite18,
author={Totok Suhardijanto. and Zahroh Nuriah. and Setiawati Darmojuwono.},
title={Teaching Machines to Recognize Idiomatic Expressions - A Comparative Analysis of Compound Word Recognition Results between Human and Machine Annotation},
booktitle={The Tenth Conference on Applied Linguistics and The Second English Language Teaching and Technology Conference in collaboration with The First International Conference on Language, Literature, Culture, and Education - CONAPLIN and ICOLLITE},
year={2018},
pages={376-380},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007167603760380},
isbn={978-989-758-332-2},
issn={2184-3376},
}

TY - CONF

JO - The Tenth Conference on Applied Linguistics and The Second English Language Teaching and Technology Conference in collaboration with The First International Conference on Language, Literature, Culture, and Education - CONAPLIN and ICOLLITE
TI - Teaching Machines to Recognize Idiomatic Expressions - A Comparative Analysis of Compound Word Recognition Results between Human and Machine Annotation
SN - 978-989-758-332-2
IS - 2184-3376
AU - Suhardijanto, T.
AU - Nuriah, Z.
AU - Darmojuwono, S.
PY - 2018
SP - 376
EP - 380
DO - 10.5220/0007167603760380
PB - SciTePress