A TALE OF TWO (SIMILAR) CITIES - Inferring City Similarity through Geo-spatial Query Log Analysis

Rohan Seth, Michele Covell, Deepak Ravichandran, D. Sivakumar, Shumeet Baluja

Abstract

Understanding the backgrounds and interest of the people who are consuming a piece of content, such as a news story, video, or music, is vital for the content producer as well the advertisers who rely on the content to provide a channel on which to advertise. We extend traditional search-engine query log analysis, which has primarily concentrated on analyzing either single or small groups of queries or users, to examining the complete query stream of very large groups of users – the inhabitants of 13,377 cities across the United States. Query logs can be a good representation of the interests of the city’s inhabitants and a useful characterization of the city itself. Further, we demonstrate how query logs can be effectively used to gather city-level statistics sufficient for providing insights into the similarities and differences between cities. Cities that are found to be similar through the use of query analysis correspond well to the similar cities as determined through other large-scale and time-consuming direct measurement studies, such as those undertaken by the Census Bureau.

References

  1. Andrade, L. and Silva, M.J. (2006). “Relevance Ranking for Geographic IR.” In Proc. ACM SIGIR Workshop on Geo.Information Retrieval
  2. Backstrom, L., Kleinberg, J., Kumar, R., and Novak, J. (2008). “Spatial Variation in Search Engine Queries.” In Proc. International Conference on World Wide Web pp. 357-366.
  3. Y. Chen, T. Suel, and A. Markowetz (2006). “Efficient Query Processing in Geographic Web Search Engines.” In Proc. ACM SIGMOD Int. Conference on Management of Data pp. 277-288.
  4. Datta, R. (2005) “PHIL: The Probabilistic Hierarchical Inferential Learner,” 10th Annual Bay Area Discrete Mathematics Day. http://math.berkeley.edu/ datta/philtalk.pdf
  5. Gan, Q., Attenberg, J., Markowetz, A., and Suel, T. (2008). “Analysis of Geographic Queries in a Search Engine Log.” In Proc. ACM International Workshop on Location and the Web pp. 49-56.
  6. Harik, G., Shazeer, N. (2004) “Method and Apparatus for Learning a Probabilistic Generative Model for Text,” U.S. Patent 7231393.
  7. Hassan, A., Jones, R. and Diaz, F. (2009). “A Case Study of using Geographic Cues to Predict Query News Intent.” In Proc. ACM SIGSPATIAL International Conference on Advances in Geographic information Systems pp. 33-41.
  8. Jansen, B.J., and Spink, A. (2006). “How are We Searching the World Wide Web? A Comparison of Nine Search Engine Transaction Logs.” Info. Processing and Management 42 (1): 248-263.
  9. Jones, R., Zhang, W.V., Rey, B., Jhala, P., and Stipp, E. (2008a). “Geographic Intention and Modification in Web Search,” Int. J. Geographical Information Science. 22 (3): 229-246.
  10. Jones, R., Hassan, A., and Diaz, F. (2008b). “Geographic Features in Web Search Retrieval.” In Proc. International Workshop on Geographic Information Retrieval (Napa Valley, CA), pp. 57-58.
  11. Salton. G. and McGill, M.J. (1983). Introduction to Modern Information Retrieval. McGraw-Hill. ISBN 0070544840.
  12. Sanderson, T. and Kohler, J. (2004). “Analyzing Geographic Queries.” In Proc. ACM SIGIR Wkshp on Geo. Info. Retrieval
  13. Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. (1999). “Analysis of a Very Large Web Search Engine Query Log.” SIGIR Forum 33 (1): 6-12.
  14. Yi, X., Raghavan, H., and Leggetter, C. (2009). “Discovering Users' Specific Geo Intention in Web Search.” In Proc. International Conference on World Wide Web (Madrid, Spain), pp. 481-490.
  15. Zhuang, Z., Brunk, C., and Giles, C.L. (2008a). “Modeling and Visualizing Geo-Sensitive Queries based on User Clicks.” In Proc. ACM International Workshop on Location and the Web, pp. 73-76.
  16. Zhuang, Z., Brunk, C., Mitra, P., and. Giles C.L (2008b). “Towards Click-Based Models of Geographic Interests in Web Search.” In Proc. IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology pp. 293-299
Download


Paper Citation


in Harvard Style

Seth R., Covell M., Ravichandran D., Sivakumar D. and Baluja S. (2011). A TALE OF TWO (SIMILAR) CITIES - Inferring City Similarity through Geo-spatial Query Log Analysis . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 171-181. DOI: 10.5220/0003641501790189


in Bibtex Style

@conference{kdir11,
author={Rohan Seth and Michele Covell and Deepak Ravichandran and D. Sivakumar and Shumeet Baluja},
title={A TALE OF TWO (SIMILAR) CITIES - Inferring City Similarity through Geo-spatial Query Log Analysis},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={171-181},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003641501790189},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - A TALE OF TWO (SIMILAR) CITIES - Inferring City Similarity through Geo-spatial Query Log Analysis
SN - 978-989-8425-79-9
AU - Seth R.
AU - Covell M.
AU - Ravichandran D.
AU - Sivakumar D.
AU - Baluja S.
PY - 2011
SP - 171
EP - 181
DO - 10.5220/0003641501790189