Management of Scientific Documents and Visualization of Citation Relationships using Weighted Key Scientific Terms

Hui Wei, Youbing Zhao, Shaopeng Wu, Zhikun Deng, Farzad Parvinzamir, Feng Dong, Enjie Liu, Gordon Clapworthy

Abstract

Effective management and visualization of scientific and research documents can greatly assist researchers by improving understanding of relationships (e.g. citations) between the documents. This paper presents work on the management and visualization of large corpuses of scientific papers in order to help researchers explore their citation relationships. Term selection and weighting are used for mining citation relationships by identifying the most relevant. To this end, we present a variation of the TF-IDF scheme, which uses external domain resources as references to calculate the term weighting in a particular domain; document weighting is taken into account in the calculation of term weighting from a group of citations. A simple hierarchical word weighting method is also presented. The work is supported by an underlying architecture for document management using NoSQL databases and employs a simple visualization interface.

References

  1. Borst W, 1997. Construction of Engineering Ontologies. PhD thesis, Institute for Telematica and Information Technology, University of Twente, Netherlands.
  2. Chen C, 2004. Searching for intellectual turning points: Progressive knowledge domain visualization, Proc Natl Acad Sci 101 (suppl 1):5303-5310.
  3. Debole F, Sebastiani F, 2003. Supervised term weighting for automated text categorization. In Proc 2003 ACM Symp Applied Computing, pp 784-788. ACM Press.
  4. Domeniconi G, Moro G, Pasolini R, Sartori C, (2015). A Study on Term Weighting for Text Categorization: A Novel Supervised Variant of tf.idf. In Proc. 4th Intl Conf Data Management Technologies and Applications, pp. 26-37.
  5. Domeniconi G, Moro G, Pasolini R, Sartori C, 2014. Crossdomain Text Classification through Iterative Refining of Target Categories Representations. In: Proc 6th Int Conf on Knowledge Discovery and Information Retrieval (KDIR).
  6. Fensel D, Hendler J, Lieberman H, Wahlster W, BernersLee T, 2005. Sesame: An Architecture for Storing and Querying RDF Data and Schema Information, MIT Press eBook Chapters: Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential, pp 197-222.
  7. Grolinger K, Higashino WA, Tiwari A, Capretz MAM, 2013. Data management in cloud environments: NoSQL and NewSQL data stores, Journal of Cloud Computing: Advances, Systems and Applications 2013, 2:22 doi:10.1186/2192-113X-2-22.
  8. Gruber TR, 1993. A Translation Approach to Portable Ontologies. Knowledge Acquisition, 5(2):199-220.
  9. Huang H, Dong Z, 2013. Research on architecture and query performance based on distributed graph database Neo4j, Proc 3rd Int Conf Consumer Electronics, Communications and Networks (CECNet), pp 533-536.
  10. Jiang X, Zhang J, 2016, A text visualization method for cross-domain research topic mining, Journal of Visualization (online).
  11. Kivikangas P, Ishizuka M, 2012. Improving Semantic Queries by Utilizing UNL Ontology and a Graph Database, Proc 6th IEEE Int Conf Semantic Computing, pp 83-86.
  12. Li F, Pan S J, Jin O, Yang Q, Zhu X., 2012. Cross-domain co-extraction of sentiment and topic lexicons. In Proc 50th Annual Mtg Assoc for Computational Linguistics: Long Papers - Volume 1 (ACL12), pp. 410-419.
  13. Mane KK, Börner K, 2004. Mapping topics and topic bursts in PNAS, Proc Natl Acad Sci 101 (suppl 1):5287-5290.
  14. Tang W, Kwee AT, Tsai FS, 2009. Accessing contextual information for interactive novelty detection. In: Proc. European Conf Inf'n Retrieval (ECIR) Contextual Information Access, Seeking and Retrieval Evaluation, pp. 1-4.
  15. Tsai FS, Kwee AT, 2011. Experiments in term weighting for novelty mining. Expert Systems with Applications, 38(11):14094-14101.
  16. Zhang Y, Tsai FS, 2009. Combining named entities and tags for novel sentence detection. In: Proc WSDM Wkshp on Exploiting Semantic Annotations in Inf'n Retrieval (ESAIR 2009), pp. 30-34.
Download


Paper Citation


in Harvard Style

Wei H., Zhao Y., Wu S., Deng Z., Parvinzamir F., Dong F., Liu E. and Clapworthy G. (2016). Management of Scientific Documents and Visualization of Citation Relationships using Weighted Key Scientific Terms . In Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-193-9, pages 135-143. DOI: 10.5220/0005981501350143


in Bibtex Style

@conference{data16,
author={Hui Wei and Youbing Zhao and Shaopeng Wu and Zhikun Deng and Farzad Parvinzamir and Feng Dong and Enjie Liu and Gordon Clapworthy},
title={Management of Scientific Documents and Visualization of Citation Relationships using Weighted Key Scientific Terms},
booktitle={Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2016},
pages={135-143},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005981501350143},
isbn={978-989-758-193-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Management of Scientific Documents and Visualization of Citation Relationships using Weighted Key Scientific Terms
SN - 978-989-758-193-9
AU - Wei H.
AU - Zhao Y.
AU - Wu S.
AU - Deng Z.
AU - Parvinzamir F.
AU - Dong F.
AU - Liu E.
AU - Clapworthy G.
PY - 2016
SP - 135
EP - 143
DO - 10.5220/0005981501350143