A Method forWeb Content Extraction and Analysis in the Tourism Domain

Ermelinda Oro, Massimo Ruffolo

Abstract

Big data generated across the web is assuming growing importance in producing insights useful to understand real-world phenomena and to make smarter decisions. The tourism is one of the leading growth sectors, therefore, methods and technologies that simplify and empower web contents gathering, processing, and analysis are becoming more and more important in this application area. In this paper, we present a web content analytics method that automates and simplifies content extraction and acquisition from many different web sources, like newspapers and social networks, accelerate content cleaning, analysis, and annotation, makes faster insights generation by visual exploration of analysis results. We, also, describe an application to a real-world use case regarding the analysis of the touristic impact of the Italian Open tennis tournament. Obtained results show that our method makes the analysis of news and social media posts more easy, agile, and effective.

References

  1. Baccianella, S., Esuli, A., and Sebastiani, F. (2010). Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In LREC, volume 10, pages 2200-2204.
  2. Bifet, A. and Frank, E. (2010). Sentiment knowledge discovery in twitter streaming data. In Discovery Science, pages 1-15. Springer.
  3. Bird, S. (2006). Nltk: the natural language toolkit. In Proceedings of the COLING/ACL on Interactive presentation sessions, pages 69-72. Association for Computational Linguistics.
  4. Chen, C. P. and Zhang, C.-Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences, 275:314- 347.
  5. Chen, Y., Amiri, H., Li, Z., and Chua, T.-S. (2013). Emerging topic detection for organizations from microblogs. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 43-52. ACM.
  6. Fourie, J. and Santana-Gallego, M. (2011). The impact of mega-sport events on tourist arrivals. Tourism Management, 32(6):1364-1370.
  7. Gibson, H. J. (1998). Sport tourism: a critical analysis of research. Sport management review, 1(1):45-76.
  8. Kim, W. and Walker, M. (2012). Measuring the social impacts associated with super bowl xliii: Preliminary development of a psychic income scale. Sport Management Review, 15(1):91-108.
  9. Marine-Roig, E. and Clavé, S. A. (2015). Tourism analytics with massive user-generated content: A case study of barcelona. Journal of Destination Marketing & Management, 4(3):162-172.
  10. Medhat, W., Hassan, A., and Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4):1093-1113.
  11. Oro, E. and Ruffolo, M. (2015). Using apps and rules in contextual workflows to semantically extract data from documents. In roceedings of the 17th International Conference on Information Integration and Web-based Applications & Services.
  12. Pääkkönen, P. and Pakkala, D. (2015). Reference architecture and classification of technologies, products and services for big data systems. Big Data Research, 2(4):166-186.
  13. Thomaz, G. M., Biz, A. A., Bettoni, E. M., Mendes-Filho, L., and Buhalis, D. (2016). Content mining framework in social media: A fifa world cup 2014 case analysis. Information & Management.
  14. Wang, D., Li, X. R., and Li, Y. (2013). China's “smart tourism destination” initiative: A taste of the servicedominant logic. Journal of Destination Marketing & Management, 2(2):59-61.
  15. Xiang, Z. and Fesenmaier, D. R. (2017). Big data analytics, tourism design and smart tourism. In Analytics in Smart Tourism Design, pages 299-307. Springer.
Download


Paper Citation


in Bibtex Style

@conference{iceis17,
author={Ermelinda Oro and Massimo Ruffolo},
title={A Method forWeb Content Extraction and Analysis in the Tourism Domain},
booktitle={Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2017},
pages={365-370},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006371103650370},
isbn={978-989-758-247-9},
}


in Harvard Style

Oro E. and Ruffolo M. (2017). A Method forWeb Content Extraction and Analysis in the Tourism Domain . In Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-247-9, pages 365-370. DOI: 10.5220/0006371103650370


in EndNote Style

TY - CONF
JO - Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - A Method forWeb Content Extraction and Analysis in the Tourism Domain
SN - 978-989-758-247-9
AU - Oro E.
AU - Ruffolo M.
PY - 2017
SP - 365
EP - 370
DO - 10.5220/0006371103650370