loading
Documents

Research.Publish.Connect.

Paper

Paper Unlock

Authors: D.-T. Phan 1 ; P. Leray 1 and C. Sinoquet 2

Affiliations: 1 Polytech/ University of Nantes, France ; 2 Faculty of Sciences and University of Nantes, France

ISBN: 978-989-758-070-3

Keyword(s): Linkage Disequilibrium, Genome-wide Association Study, Multilocus Association Study, Data Dimension Reduction, Probabilistic Graphical Model, Bayesian Network.

Related Ontology Subjects/Areas/Topics: Bioinformatics ; Biomedical Engineering ; Biostatistics and Stochastic Models ; Data Mining and Machine Learning

Abstract: Association genetics, and in particular genome-wide association studies (GWASs), aim at elucidating the etiology of complex genetic diseases. In the domain of association genetics, machine learning provides an appealing alternative framework to standard statistical approaches. Pioneering works (Mourad et al., 2011) have proposed the forest of latent trees (FLTM) to model genetical data at the genome scale. The FLTM is a hierarchical Bayesian network with latent variables. A key to FLTMconstruction is the recursive clustering of variables, in a bottom up subsuming process. In this paper, we study the impact of the choice of the clustering method to be plugged in the FLTM learning algorithm, in a GWAS context. Using a real GWAS data set describing 41400 variables for each of 3004 controls and 2005 individuals affected by Crohn’s disease, we compare the influence of three clustering methods. Data dimension reduction and ability to split or group putative causal SNPs in agreement with the underlying biological reality are analyzed. To assess the risk of missing significant association results through subsumption, we also compare the methods through the corresponding FLTM-driven GWASs. In the GWAS context and in this framework, the choice of the clustering method does not impact the satisfying performance of the downstream application, both in power and detection of false positive associations. (More)

PDF ImageFull Text

Download
Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 34.235.143.190

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Phan, D.; Leray, P. and Sinoquet, C. (2015). Modeling Genetical Data with Forests of Latent Trees for Applications in Association Genetics at a Large Scale - Which Clustering Method should Be Chosen?.In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2015) ISBN 978-989-758-070-3, pages 5-16. DOI: 10.5220/0005179800050016

@conference{bioinformatics15,
author={D.{-}T. Phan. and P. Leray. and C. Sinoquet.},
title={Modeling Genetical Data with Forests of Latent Trees for Applications in Association Genetics at a Large Scale - Which Clustering Method should Be Chosen?},
booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2015)},
year={2015},
pages={5-16},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005179800050016},
isbn={978-989-758-070-3},
}

TY - CONF

JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2015)
TI - Modeling Genetical Data with Forests of Latent Trees for Applications in Association Genetics at a Large Scale - Which Clustering Method should Be Chosen?
SN - 978-989-758-070-3
AU - Phan, D.
AU - Leray, P.
AU - Sinoquet, C.
PY - 2015
SP - 5
EP - 16
DO - 10.5220/0005179800050016

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.