Real-Time Data Harvesting Method for Czech Twitter

Pavel Král, Václav Rajtmajer


This paper deals with automatic analysis of Czech social media. The main goal is to propose an approach to harvest interesting messages from Twitter in Czech language with high download speed. This method uses user lists to discover potentially interesting tweets to download. It is motivated by the fact that only about 20% of Twitter users are posting informative messages, whereas the remaining 80% not and that it is possible to identify the "important" users by the user lists. The experimental results show that the proposed method is very efficient because it harvests about 6 times more data than the other approaches. This approach should be integrated into an experimental system for the Czech News Agency to monitor the current data-flow on Twitter, download messages in real-time, analyze them and extract relevant events.


  1. Atefeh, F. and Khreich, W. (2015). A survey of techniques for event detection in Twitter. Computational Intelligence, 31(1):132-164.
  2. Earle, P. S., Bowden, D. C., and Guy, M. (2012). Twitter earthquake detection: earthquake monitoring in a social world. Annals of Geophysics, 54(6).
  3. Java, A., Song, X., Finin, T., and Tseng, B. (2009). Why we Twitter: An analysis of a microblogging community. In Advances in Web Mining and Web Usage Analysis, pages 118-138. Springer.
  4. Kouloumpis, E., Wilson, T., and Moore, J. D. (2011). Twitter sentiment analysis: The good the bad and the omg! Icwsm, 11:538-541.
  5. Li, C., Sun, A., and Datta, A. (2012). Twevent: segmentbased event detection from tweets. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 155-164. ACM.
  6. Naaman, M., Boase, J., and Lai, C.-H. (2010). Is it really about me?: message content in social awareness streams. In Proceedings of the 2010 ACM conference on Computer supported cooperative work, pages 189- 192. ACM.
  7. Pak, A. and Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. In LREC, volume 10, pages 1320-1326.
  8. Sakaki, T., Okazaki, M., and Matsuo, Y. (2010). Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, pages 851-860. ACM.
  9. Yardi, S. and Boyd, D. (2010). Dynamic debates: An analysis of group polarization over time on Twitter. Bulletin of Science, Technology & Society, 30(5):316-327.
  10. Zeman, D., Dus?ek, O., Marec?ek, D., Popel, M., Ramasamy, L., S?te?pánek, J., Z?abokrtskÈ, Z., and Hajic?, J. (2014). Hamledt: Harmonized multi-language dependency treebank. Language Resources and Evaluation, 48(4):601-637.

Paper Citation

in Harvard Style

Král P. and Rajtmajer V. (2017). Real-Time Data Harvesting Method for Czech Twitter . In Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-220-2, pages 259-265. DOI: 10.5220/0006212402590265

in Bibtex Style

author={Pavel Král and Václav Rajtmajer},
title={Real-Time Data Harvesting Method for Czech Twitter},
booktitle={Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},

in EndNote Style

JO - Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Real-Time Data Harvesting Method for Czech Twitter
SN - 978-989-758-220-2
AU - Král P.
AU - Rajtmajer V.
PY - 2017
SP - 259
EP - 265
DO - 10.5220/0006212402590265