Explorative Analysis of Heterogeneous, Unstructured, and Uncertain Data - A Computer Science Perspective on Biodiversity Research

C. Beckstein, S. Böcker, M. Bogdan, H. Bruehlheide, H. M. Bücker, J. Denzler, P. Dittrich, I. Grosse, A. Hinneburg, B. König-Ries, F. Löffler, M. Marz, M. Müller-Hannemann, M. Winter, W. Zimmermann

2014

Abstract

We outline a blueprint for the development of new computer science approaches for the management and analysis of big data problems for biodiversity science. Such problems are characterized by a combination of different data sources each of which owns at least one of the typical characteristics of big data (volume, variety, velocity, or veracity). For these problems, we envision a solution that covers different aspects of integrating data sources and algorithms for their analysis on one of the following three layers: At the data layer, there are various data archives of heterogeneous, unstructured, and uncertain data. At the functional layer, the data are analyzed for each archive individually. At the meta-layer, multiple functional archives are combined for complex analysis.

References

  1. Berg, C. and Zimmermann, W. (2014). Evaluierung von Möglichkeiten zur Implementierung von Semantischen Analysen für Domänenspezifische Sprachen. In Software-Engineering 2014, Workshopband Arbeitstagung Programmiersprachen ATPS 2014, volume 1129, pages 112-128. CEUR Workshop Proceedings.
  2. Blei, D. M. (2012). Probabilistic topic models. Commun. ACM, 55(4):77-84.
  3. Catapano, T., Hobern, D., Lapp, H., Morris, R. A., Morrison, N., Noy, N., Schildhauer, M., and Thau, D. (2011). Recommendations for the use of knowledge organization systems by GBIF. Global Biodiversity Information Facility (GBIF), Copenhagen. Available at http://www.gbif.org/orc.
  4. Fortmeier, O., Bücker, H. M., Fagginger Auer, B. O., and Bisseling, R. H. (2013). A new metric enabling an exact hypergraph model for the communication volume in distributed-memory parallel applications. Parallel Computing, 39(8):319-335.
  5. Freytag, A., Rodner, E., Bodesheim, P., and Denzler, J. (2012). Rapid uncertainty computation with Gaussian processes and histogram intersection kernels. In Proc. Asian Conf. Comput. Vis., pages 511-524.
  6. Gohr, A., Hinneburg, A., Schult, R., and Spiliopoulou, M. (2009). Topic evolution in a stream of documents.
  7. Gohr, A., Hinneburg, A., Spiliopoulou, M., and Usbeck, R. (2011). On the distinctiveness of tags in collaborative tagging systems. In Proc. Int. Conf. Web Intelligence, Mining and Semantics, WIMS, page 62. ACM.
  8. Gohr, A., Spiliopoulou, M., and Hinneburg, A. (2010). Visually summarizing the evolution of documents under a social tag. In Proc. of International Conference on Knowledge Discovery and Information Retrieval, KDIR.
  9. Graham, E. A., Henderson, S., and Schloss, A. (2011). Using mobile phones to engage citizen scientists in research. Eos, Transactions American Geophysical Union, 92(38):313-315.
  10. Grau, J., Keilwagen, J., Gohr, A., Haldemann, B., Posch, S., and Grosse, I. (2012). Jstacs: A Java framework for statistical analysis and classification of biological sequences. Journal of Machine Learning Research, 13:1967-1971.
  11. Hampton, S. E., Strasser, C. A., Tewksbury, J. J., Gram, W., Budden, A., Batcheller, A., Duke, C., and Porter, J. (2013). Big data and the future of ecology. Frontiers in Ecology and the Environment, 11(3):156-162.
  12. Hardisty, A., Roberts, D., et al. (2013). A decadal view of biodiversity informatics: Challenges and priorities. BMC Ecology, 13(1):16.
  13. Hedtke, I., Lemnian, I., Müller-Hannemann, M., and Große, I. (2014). On optimal read trimming in next generation sequencing and its complexity. In Proceedings of AlCoB 2014, volume 8542 of LNBI, pages 83-94. Springer.
  14. Hinneburg, A., Preiss, R., and Schröder, R. (2012). Topicexplorer: Exploring document collections with topic models. In Flach, P. A., Bie, T., and Cristianini, N., editors, Machine Learning and Knowledge Discovery in Databases, volume 7524 of Lecture Notes in Computer Science, pages 838-841. Springer Berlin Heidelberg.
  15. Hoffmann, J., Güttler, F., El-Laithy, K., and Bogdan, M. (2012). Cyfield-RISP: Generating dynamic instruction set processors for reconfigurable hardware using OpenCL. In Artificial Neural Networks and Machine Learning ICANN 2012, volume 7552 of LNCS, pages 169-176. Springer Berlin Heidelberg.
  16. Hoffmann, S., Otto, C., Kurtz, S., Sharma, C., Khaitovich, P., Vogel, J., Stadler, P. F., and Hackermüller, J. (2009). Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoS Comp. Biol., 5:e1000502.
  17. Jetz, W., McPherson, J. M., and Guralnick, R. P. (2012). Integrating biodiversity distribution knowledge: Toward a global map of life. Trends in Ecology & Evolution, 27(3):151-159.
  18. Lotz, T., Nieschulze, J., Bendix, J., Dobbermann, M., and König-Ries, B. (2012). Diverse or uniform?-Intercomparison of two major German project databases for interdisciplinary collaborative functional biodiversity research. Ecological Informatics, 8:10-19.
  19. Stanford http://nlp.stanford.edu/software/corenlp.shtml. [Online; accessed 19-May-2014].
  20. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., and Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity. Report of the Mc Kinsey Global Institute, Mc Kinsey & Company.
  21. Marx, V. (2013). The big challenges of big data. Nature, 498:255-260.
  22. McCallum, A. K. and Mimno, D. (2002). MALLET: A Machine Learning for Language Toolkit. http://mallet.cs.umass.edu. [Online; accessed 19- May-2014].
  23. Müller-Hannemann, M. and Schirra, S., editors (2010). Algorithm Engineering: Bridging the Gap between Algorithm Theory and Practice, volume 5971 of LNCS. Springer.
  24. Nadrowski, K., Ratcliffe, S., Bönisch, G., Bruelheide, H., Kattge, J., Liu, X., Maicher, L., Mi, X., Prilop, M., Seifarth, D., Welter, K., Windisch, S., and Wirth, C. (2013). Harmonizing, annotating and sharing data in biodiversity-ecosystem functioning research. Methods in Ecology and Evolution, 4(2):201-205.
  25. OpenNLP (2008). Apache OpenNLP. https://opennlp.apache.org/. [Online; accessed 19-July-2008].
  26. Otto, C., Stadler, P. F., and Hoffmann, S. (2012). Fast and sensitive mapping of bisulfite-treated sequencing data. Bioinformatics, 28:1698-1704.
  27. Secretary of IPBES (2012). Intergovernmental platform on biodiversity and ecosystem services. Bonn. http://www.ipbes.net/about-ipbes.html.
  28. Spüler, M., Rosenstiel, W., and Bogdan, M. (2012). Adaptive SVM-based classification increases performance of a MEG-based Brain-Computer Interface (BCI). In ICANN 2012, Part I, LNCS 7552, pages 669-676.
  29. Zerbino, D. R. and Birne, E. (2008). Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18:821-829.
Download


Paper Citation


in Harvard Style

Beckstein C., Böcker S., Bogdan M., Bruehlheide H., M. Bücker H., Denzler J., Dittrich P., Grosse I., Hinneburg A., König-Ries B., Löffler F., Marz M., Müller-Hannemann M., Winter M. and Zimmermann W. (2014). Explorative Analysis of Heterogeneous, Unstructured, and Uncertain Data - A Computer Science Perspective on Biodiversity Research . In Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-035-2, pages 251-257. DOI: 10.5220/0005098402510257


in Bibtex Style

@conference{data14,
author={C. Beckstein and S. Böcker and M. Bogdan and H. Bruehlheide and H. M. Bücker and J. Denzler and P. Dittrich and I. Grosse and A. Hinneburg and B. König-Ries and F. Löffler and M. Marz and M. Müller-Hannemann and M. Winter and W. Zimmermann},
title={Explorative Analysis of Heterogeneous, Unstructured, and Uncertain Data - A Computer Science Perspective on Biodiversity Research},
booktitle={Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,},
year={2014},
pages={251-257},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005098402510257},
isbn={978-989-758-035-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Explorative Analysis of Heterogeneous, Unstructured, and Uncertain Data - A Computer Science Perspective on Biodiversity Research
SN - 978-989-758-035-2
AU - Beckstein C.
AU - Böcker S.
AU - Bogdan M.
AU - Bruehlheide H.
AU - M. Bücker H.
AU - Denzler J.
AU - Dittrich P.
AU - Grosse I.
AU - Hinneburg A.
AU - König-Ries B.
AU - Löffler F.
AU - Marz M.
AU - Müller-Hannemann M.
AU - Winter M.
AU - Zimmermann W.
PY - 2014
SP - 251
EP - 257
DO - 10.5220/0005098402510257