Authors:
Tanzila Islam
1
;
Chyon Hae Kim
1
;
Hiroyoshi Iwata
2
;
Hiroyuki Shimono
3
;
Akio Kimura
1
;
Hein Zaw
4
;
Chitra Raghavan
4
;
Hei Leung
4
and
Rakesh Kumar Singh
5
Affiliations:
1
Department of Systems Innovation Engineering, Graduate School of Science and Engineering, Iwate University, Morioka, Iwate, Japan
;
2
Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan
;
3
Crop Science Laboratory, Faculty of Agriculture, Iwate University, Morioka, Japan
;
4
International Rice Research Institute (IRRI), Laguna, Philippines
;
5
International Center for Biosaline Agriculture (ICBA), Dubai, U.A.E.
Keyword(s):
Genome-wide DNA Polymorphisms, Stacked Autoencoder, Deep Neural Network, Separate Stacking Model, Genome Compression, Missing Value Imputation.
Abstract:
Missing value imputation and compressing genome-wide DNA polymorphism data are considered as a challenging task in genomic data analysis. Missing data consists in the lack of information in a dataset that directly influences data analysis performance. The aim is to develop a deep learning model named Autoencoder Genome Imputation and Compression (AGIC) which can impute missing values and compress genome-wide polymorphism data using a separated neural network model to reduce the computational time. This research will challenge the construction of a model by using Autoencoder for genomic analysis, in other words, a fusion research between agriculture and information sciences. Moreover, there is no knowledge of missing value imputation and genome-wide polymorphism data compression using Separated Stacking Autoencoder Model. The main contributions are: (1) missing value imputation of genome-wide polymorphism data, (2) genome-wide polymorphism data compression of Rice DNA. To demonstrate
the usage of AGIC model, real genome-wide polymorphism data from a rice MAGIC population has been used.
(More)