Authors:
Kazuko Takahashi
1
;
Hirofumi Taki
2
;
Shunsuke Tanabe
3
and
Wei Li
4
Affiliations:
1
Keiai University, Japan
;
2
Hosei University, Japan
;
3
Waseda University, Japan
;
4
Tokyo Institute of Technology, China
Keyword(s):
Automatic Coding System, Answers to Open-Ended Question, Occupation and Industry Coding, Natural Language Processing, Machine Learning, Confidence Level.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Applications and Case-studies
;
Artificial Intelligence
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Natural Language Processing
;
Pattern Recognition
;
Symbolic Systems
Abstract:
We develop a new automatic coding system with a three-grade confidence level corresponding to each of the national/international standard code sets for answers to open-ended questions regarding to respondent’s occupation and industry in social surveys including a national census. The “occupation and industry coding” is a necessary task for statistical processing. However, this task requires a great deal of labor and time-consuming. In addition, inconsistent results occur if the coders are not experts of coding. In formal research, various automatic coding systems have been developed, which are incomplete and generally unfriendly to a non-developer user. Our new system assigns three candidate codes to an answer for coders by SVMs (Support Vector Machines), and attaches a three-grade confidence level to the first-ranked predicted code by using classification scores to support a manual check of the results. The system is now open to the public through the Website of the Social Science J
apan Data Archive (SSJDA). After the submitted data file which followed the specified format is approved, the users can obtain files of codes for up to four kinds with a three-grade confidence level. In this paper, we describe our system and evaluate it.
(More)