Structuring Documents from Short Texts - The Enel SpA Case Study

Silvia Calegari, Matteo Dominoni

2015

Abstract

Nowadays, structured documents are marked-up using XML. XML is the W3C standard that allows to give a meaning about the stored content of a document by the definition of its logical structure. A logical structure can be exploited to have a focused access to structured documents. For instance, in XML Information Retrieval, the logical structure is aimed at retrieving the most relevant fragments within documents as answers to queries, instead of the whole document. The problem arises when it is not possible to automatically define the logical structure of a document by using the methodologies presented in the literature. This position paper takes into account this situation and provides a possible solution adopted in the Enel SpA energy company.

References

  1. Baeza-Yates, R. A., Navarro, G., 1996. Integrating contents and structure in text retrieval. In Newsletter ACM SIGMOD Record, Volume 25, Issue 1, ACM New York, NY, USA, 67-79.
  2. Bradley, N., 2002. The book, The XML companion, 3rd edition, In Pearson Education limited.
  3. Calegari, S., Dominoni, M., Panzeri, E., 2014. Towards the Design of an Advanced Knowledge-Based Portal for Enterprises: The KBMS 2.0 Project. In Proceedings of the 27th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, Part II, IEA/AIE, LNCS, Springer, VOLUME 8482, ISBN 978-3-31907466-5, Kaohsiung, Taiwan, pp. 58-67.
  4. Calegari, S., Dominoni, M., 2014. Modeling Ontologybased User Profiles from Company Knowledge. In Proceedings of the 6th International Conference on Advances in Databases, Knowledge, and Data Applications, DBKDA 2014, ISBN 978-1-61208-334- 6, IARIA, Chamonix, France, pp. 26-29.
  5. Callan, J., 1994. Passage-level evidence in document retrieval. In Proceedings of the 17th annual International ACM SIGIR conference on Research and development in Information Retrieval, SpringerVerlag New York, Inc., 302-310.
  6. Hearst, M., 1997. TextTiling: Segmenting Text into MultiParagraph Subtopic Passages. In Journal Computational Linguistics. Volume 23, Issue 1, MIT Press Cambridge, MA, USA, 33-64.
  7. HTML, 2013. http://www.w3.org/html/
  8. INEX, 2014. https://inex.mmci.uni-saarland.de/
  9. Klein, R., Kyrilov, A., Tokman, M., 2011. Automated assessment of short free-text responses in computer science using latent semantic analysis. In Proceedings of the 16th annual joint conference on Innovation and technology in computer science education (ITiCSE 7811). ACM, New York, NY, USA, 158-162.
  10. Lalmas, M., 2009. The book, XML Information Retrieval, In Encyclopedia of Library and Information Sciences. Taylor and Francis Group.
  11. Lewis, D.D., Hayes, P.J., 1994. Special issue of ACM: Transactions on Information Systems on text categorization, Volume 12, Issue 1, ACM New York, NY, USA.
  12. Liferay, 2013. http://www.liferay.com.
  13. Lucene, 2013. http://lucene.apache.org/core/
  14. Morris, J., Hirst, G., 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. In Journal Computational Linguistics, Volume 17, Issue 1, MIT Press Cambridge, MA, USA, 21-48.
  15. Siebel, 2009. http://www.oracle.com/partners/en/knowle dge-zone/applications/siebel/default-329117.html.
  16. Tian, Y., Wang, W., Wang, X., Rao, J., Chen, C., Ma, J., 2010. Topic detection and organization of mobile text messages. In Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM 7810). ACM, New York, NY, USA, 1877-1880.
  17. Wilkinson, R., 1994. Effective retrieval of structured documents. In Proceedings of the 17th annual International ACM SIGIR conference on Research and development in Information Retrieval, SpringerVerlag New York, Inc., 311-317.
  18. XML, 2014. www.w3.org/XML/
Download


Paper Citation


in Harvard Style

Calegari S. and Dominoni M. (2015). Structuring Documents from Short Texts - The Enel SpA Case Study . In Proceedings of 4th International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-103-8, pages 63-68. DOI: 10.5220/0005498800630068


in Bibtex Style

@conference{data15,
author={Silvia Calegari and Matteo Dominoni},
title={Structuring Documents from Short Texts - The Enel SpA Case Study},
booktitle={Proceedings of 4th International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2015},
pages={63-68},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005498800630068},
isbn={978-989-758-103-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of 4th International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Structuring Documents from Short Texts - The Enel SpA Case Study
SN - 978-989-758-103-8
AU - Calegari S.
AU - Dominoni M.
PY - 2015
SP - 63
EP - 68
DO - 10.5220/0005498800630068