MINING THE WEB FOR MEDICAL HYPOTHESES - A Proof-of-Concept System

Diana MacLean, Margo Seltzer

Abstract

As the prevalence of blogs, discussion forums, and online news services continues to grow, so too does the portion of this Web content that relates to health and medicine. We propose that everyday, medically-oriented Web content is a valuable and viable data source for medical hypothesis generation and testing, despite its being noisy. In this paper, we present a proof-of-concept system supporting this notion. We construct a corpus comprising news articles relating to the drugs Vioxx, Naproxen and Ibuprofen, that were published between 1998-2002. Using this corpus, we show that there was a significant link between Vioxx and the concept “Myocardial Infarction” well before the drug was withdrawn from the market in 2004. Indeed, within the Vioxx-related content, the concept ranks amongst the top 3.3% in terms of importance. When compared with the Naproxen and Ibuprofen control literatures, the term occurs significantly more frequently in the Vioxx-related content.

References

  1. Aronson, A. (2006). MetaMap: Mapping text to the UMLS Metathesaurus. Bethesda, MD: NLM, NIH, DHHS.
  2. Berenson, A., Harris, G., Meier, B., and Pollack, A. (2004). Despite Warnings, Drug Giant Took Long Path to Vioxx Recall. Retrieved from: http://www.nytimes.com/2004/11/14/business/ ¯ 14merck.html?pagewanted=2& r1.
  3. Brownstein, J., Freifeld, C., Reis, B., and Mandl, K. (2008). Surveillance Sans Frontières: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project. PLoS Med, 5(7):e151.
  4. Carmichael, A. (2009). Crowdsourced Health Confirms Infertility-Asthma Finding. Retrieved from: http://curetogether.com/blog/.
  5. Gilmartin, R. (2004). Vioxx Timeline: Key Dates for VIGOR and Long-term, Placebo-controlled Studies Implemented to Provide Cardiovascular Safety Data. Retrieved from: news.findlaw.com/hdocs/docs/vioxx/ 111804gilmartin.pdf.
  6. Ginsberg, J., Mohebbi, M., Patel, R., Brammer, L., Smolinski, M., and Brilliant, L. (2008). Detecting influenza epidemics using search engine query data. Nature, 457(7232):1012-1014.
  7. Gordon, M. and Lindsay, R. (1996). Toward discovery support systems: A replication, re-examination, and extension of Swanson's work on literature-based discovery of a connection between Raynaud's and fish oil. Journal of the American Society for Information Science, 47(2):116-128.
  8. Hu, X. (2005). Mining novel connections from large online digital library using biomedical ontologies. Library Management, 26(4/5):261-270.
  9. Hu, X., Yoo, I., Song, M., Zhang, Y., and Song, I. (2005). Mining undiscovered public knowledge from complementary and non-interactive biomedical literature through semantic pruning. In ACM CICM, pages 249- 250.
  10. Prakash, S. and Valentine, V. (2007). Timeline: The Rise and Fall of Vioxx. Retrieved from: http://www.npr.org/templates/story/ story.php?storyId=5470430.
  11. Pratt, W. and Yetisgen-Yildiz, M. (2003). LitLinker: capturing connections across the biomedical literature. In ACM K-CAP, pages 105-112. ACM Press New York, NY, USA.
  12. Reuters (2005). A Timeline of Vioxx. Retrieved from: http://www.nytimes.com/2005/08/19/business/ 19vioxx.timeline.html?ref=business.
  13. Smalheiser, N. and Swanson, D. (1998). Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. Computer Methods and Programs in Biomedicine, 57(3):149-153.
  14. Solomon, D., Glynn, R., Levin, R., and Avorn, J. (2002). Nonsteroidal Anti-inflammatory Drug Use and Acute Myocardial Infarction. http://archinte.ama-assn.org/cgi/content/abstract/162/ 10/ 1099?view=abstract.
  15. Swanson, D. (1986). Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30(1):7-18.
  16. Swanson, D. (1988). Migraine and magnesium: eleven neglected connections. Perspectives in Biology and Medicine, 31(4):526-57.
  17. Swanson, D. (2001). On the fragmentation of knowledge, the connection explosion, and assembling other people's ideas. Bulletin of the American Society for Information Science and Technology, 27(3):12-14.
  18. Weeber, M., Klein, H., de Jong-van den Berg, L., and Vos, R. (2001). Using concepts in literature-based discovery: Simulating Swanson's Raynaud-fish oil and migraine-magnesium discoveries. Journal of the American Society for Information Science and Technology, 52(7):548-557.
  19. White, R. and Horvitz, E. (2008). Cyberchondria: Studies of the Escalation of Medical Concerns in Web Search. ACM TOIS, 27(4).
Download


Paper Citation


in Harvard Style

MacLean D. and Seltzer M. (2011). MINING THE WEB FOR MEDICAL HYPOTHESES - A Proof-of-Concept System . In Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2011) ISBN 978-989-8425-34-8, pages 303-308. DOI: 10.5220/0003166403030308


in Bibtex Style

@conference{healthinf11,
author={Diana MacLean and Margo Seltzer},
title={MINING THE WEB FOR MEDICAL HYPOTHESES - A Proof-of-Concept System},
booktitle={Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2011)},
year={2011},
pages={303-308},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003166403030308},
isbn={978-989-8425-34-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Health Informatics - Volume 1: HEALTHINF, (BIOSTEC 2011)
TI - MINING THE WEB FOR MEDICAL HYPOTHESES - A Proof-of-Concept System
SN - 978-989-8425-34-8
AU - MacLean D.
AU - Seltzer M.
PY - 2011
SP - 303
EP - 308
DO - 10.5220/0003166403030308