MULTI-LABELED PATENT DOCUMENT CLASSIFICATION USING TECHNICAL TERM THESAURUS

Yoshimi Suzuki, Fumiyo Fukumoto

2011

Abstract

This paper presents amethod for patent document classification by using an expanded technical term thesaurus. For classifying structural documents such as patent documents, structural information is very useful. However, if we use documents divided into several applicant tags, the number of words are limited. For example, ‘Title of invention’ tag is very important for patent document classification. However, the number of words in the tag is very few. Therefore, in order to deal with this problem, we employ two methods. One is to classify applicant tags into semantic tags, the other is word expansion using an expanded technical term thesaurus. For thesaurus expansion, our system integrates technical terms into a thesaurus using patent documents. The classification results showed the method using the expanded thesaurus was better than that without thesaurus. Although our method is very simple, it is comparable to other methods. These results suggest that thesaurus and our method to expand thesaurus can be useful for patent document classification.

References

  1. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Bradford Books.
  2. Hagiwara, M., Ogawa, Y., and Toyama, K. (2006). Selection of effective contextual information for automatic synonym acquisition. In In Proc. of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 353-360.
  3. Hindle, D. (1990). Noun classification from predicateargument structures. In Proceedings of 28th Annual Meeting of the Association for Computational Linguistics, pages 268-275.
  4. Iwayama, M., Fujii, A., and Kando, N. (2005). Overview of classification subtask at ntcir-5 patent retrieval task. In Proceedings of NTCIR-5 Workshop Meeting.
  5. Japan Science and Technology Agency (1999). JST (JICST) Thesaurus 1999. http://jois.jst.go.jp/JOIS/html/thesaurus index.htm.
  6. Kim, J.-H., Huang, J.-X., Jung, H.-Y., and Choi, K.-S. (2005). Patent document retrieval and classification at kaist. In Proceedings of NTCIR-5 Workshop Meeting.
  7. Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics Proceedings of the Conference, pages 768-774.
  8. National Language Research Institute (1964). Bunruigoihyo. Shuei publisher (In Japanese).
  9. Tokunaga, T. (1997). Extending a thesaurus by classifying words. In In Proceedings of the ACL-EACL Workshop on Automatic Information Extraction and Building of Lexical Semantic Resources, pages 16-21.
  10. Uramoto, N. (1996). Positioning unknown words in a thesaurus by using information extracted from a corpus. In In proceedings of COLING'96, pages 956-961.
Download


Paper Citation


in Harvard Style

Suzuki Y. and Fukumoto F. (2011). MULTI-LABELED PATENT DOCUMENT CLASSIFICATION USING TECHNICAL TERM THESAURUS . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2011) ISBN 978-989-8425-80-5, pages 425-428. DOI: 10.5220/0003658504250428


in Bibtex Style

@conference{keod11,
author={Yoshimi Suzuki and Fumiyo Fukumoto},
title={MULTI-LABELED PATENT DOCUMENT CLASSIFICATION USING TECHNICAL TERM THESAURUS},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2011)},
year={2011},
pages={425-428},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003658504250428},
isbn={978-989-8425-80-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2011)
TI - MULTI-LABELED PATENT DOCUMENT CLASSIFICATION USING TECHNICAL TERM THESAURUS
SN - 978-989-8425-80-5
AU - Suzuki Y.
AU - Fukumoto F.
PY - 2011
SP - 425
EP - 428
DO - 10.5220/0003658504250428