Company Mention Detection for Large Scale Text Mining

Rebecca J. Passonneau, Tifara Ramelson, Boyi Xie

2014

Abstract

Text mining on a large scale that addresses actionable prediction needs to contend with noisy information in documents, and with interdependencies between the NLP techniques applied and the data representation. This paper presents an initial investigation of the impact of improved company mention detection for financial analytics using Named Entity recognition and coreference. Coverage of company mention detection improves dramatically. Improvement for prediction of stock price varies, depending on the data representation.

References

  1. Bar-Haim, R., Dinur, E., Feldman, R., Fresko, M., and Goldstein, G. (2011). Identifying and following expert investors in stock microblogs. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1310-1319, Edinburgh, Scotland, UK.
  2. Bulyko, I. and Ostendorf, M. (2003). Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In Proc. HLT-NAACL 2003, pages 7-9.
  3. Chua, C., Milosavljevic, M., and Curran, J. R. (2009). A sentiment detection engine for internet stock message boards. In Proceedings of the Australasian Language Technology Association Workshop 2009, pages 89-93, Sydney, Australia.
  4. Das, D. and Smith, N. A. (2011). Semi-supervised framesemantic parsing for unknown predicates. In Proceedings of the 49th Annual Meeting of the ACL, HLT 7811, pages 1435-1444, Stroudsburg, PA, USA.
  5. Das, D. and Smith, N. A. (2012). Graph-based lexicon expansion with sparsity-inducing penalties. In HLTNAACL, pages 677-687.
  6. Devitt, A. and Ahmad, K. (2007). Sentiment polarity identification in financial news: A cohesion-based approach. In Proceedings of the 45th Annual Meeting of the ACL, pages 984-991, Prague, Czech Republic.
  7. Engelberg, J. and Parsons, C. A. (2011). The causal impact of media in financial markets. Journal of Finance, 66(1):67-97.
  8. Feldman, R., Rosenfeld, B., Bar-Haim, R., and Fresko, M. (2011). The stock sonar - sentiment analysis of stocks based on a hybrid approach. In Proceedings of the Twenty-Third Conference on Innovative Applications of Artificial Intelligence, August 9-11, 2011, San Francisco, California, USA.
  9. Fillmore, C. J. (1976). Frame semantics and the nature of language. Annals of the New York Academy of Sciences, 280(1):20-32.
  10. Gentzkow, M. and Shapiro, J. M. (2010). What drives media slant? Evidence from U.S. daily newspapers. Econometrica, 78(1):3571.
  11. Haider, S. A. and Mehrotra, R. (2011). Corporate news classification and valence prediction: A supervised approach. In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011), pages 175-181, Portland, Oregon.
  12. Kogan, S., Levin, D., Routledge, B. R., Sagi, J. S., and Smith, N. A. (2009). Predicting risk from financial reports with regression. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, NAACL 7809, pages 272-280, Stroudsburg, PA, USA.
  13. Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., and Jurafsky, D. (2013). Deterministic coreference resolution based on entity-centric, precisionranked rules. Computational Linguistics, 39(4).
  14. Luss, R. and d'Aspremont, A. (2008). Predicting abnormal returns from news using text classification. CoRR, abs/0809.2792.
  15. Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., and McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the ACL, pages 55-60.
  16. McClosky, D., Charniak, E., and Johnson, M. (2010). Automatic domain adaptation for parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, HLT 7810, pages 28-36, Stroudsburg, PA, USA.
  17. Moschitti, A. (2006). Making tree kernels practical for natural language learning. In In Proceedings of the 11th Conference of the European Chapter of the ACL.
  18. O'Connor, B., Stewart, B. M., and Smith, N. A. (2013). Learning to extract international relations from political context. In Proceedings of the 51st Annual Meeting of the ACL, pages 1094-1104, Sofia, Bulgaria.
  19. Ravi, S., Knight, K., and Soricut, R. (2008). Automatic prediction of parser accuracy. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 887-896, Honolulu, Hawaii.
  20. Rosenfeld, B. and Feldman, R. (2007). Using corpus statistics on entities to improve semi-supervised relation extraction from the web. In ACL 2007, Proceedings of the 45th Annual Meeting of the ACL, June 23-30, 2007, Prague, Czech Republic.
  21. Roux, J. L., Foster, J., Wagner, J., Samad, R., Kaljahi, Z., and Bryl, A. (2012). DUC-Paris13 systems for the SANCL 2012 shared task.
  22. Sarikaya, R., Gravano, A., and Gao, Y. (2005). Rapid language model development using external resources for new spoken dialog domains. In International Congress of Acoustics, Speech, and Signal Processing (ICASSP), pages 573-576, Philadelphia, PA, USA. IEEE, Signal Processing Society.
  23. Tetlock, P. C. (2007). Giving Content to Investor Sentiment: The Role of Media in the Stock Market. The Journal of Finance.
  24. Whissel, C. M. (1989). The dictionary of affect in language. Emotion: Theory, Research, and Experience, 39(4):113-131.
  25. Wolodja Wentland, Johannes Knopp, C. S. and Hartung, M. (2008). Building a multilingual lexical resource for named entity disambiguation, translation and transliteration. In (ELRA), E. L. R. A., editor, Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco.
  26. Xie, B., Passonneau, R. J., Wu, L., and Creamer, G. (2013). Semantic frames to predict stock price movement. In Proceedings of the 51st Annual Meeting of the ACL, pages 873-883, Sofia, Bulgaria.
Download


Paper Citation


in Harvard Style

J. Passonneau R., Ramelson T. and Xie B. (2014). Company Mention Detection for Large Scale Text Mining . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2014) ISBN 978-989-758-048-2, pages 512-520. DOI: 10.5220/0005174405120520


in Bibtex Style

@conference{sstm14,
author={Rebecca J. Passonneau and Tifara Ramelson and Boyi Xie},
title={Company Mention Detection for Large Scale Text Mining},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2014)},
year={2014},
pages={512-520},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005174405120520},
isbn={978-989-758-048-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: SSTM, (IC3K 2014)
TI - Company Mention Detection for Large Scale Text Mining
SN - 978-989-758-048-2
AU - J. Passonneau R.
AU - Ramelson T.
AU - Xie B.
PY - 2014
SP - 512
EP - 520
DO - 10.5220/0005174405120520