New Fitness Functions in Binary Harris Hawks Optimization for Gene Selection in Microarray Datasets

Ruba Khurma, Pedro Castillo, Ahmad Sharieh, Ibrahim Aljarah

Abstract

Gene selection (GS) is a challenging problem in medical applications. This is because of the availability of a large number of genes and a limited number of patient’s samples in microarray datasets. Selecting the most relevant genes is a necessary pre-processing step for building reliable cancer classification systems. This paper proposes two new fitness functions in Binary Harris Hawks Optimization (BHHO) for GS. The main objective is to select a small number of genes and achieve high classification accuracy. The first fitness function balances between the classification performance and the number of genes. This is done by using a weight that increases linearly throughout the optimization process. The second fitness function is applied across two-stages. The first stage optimizes the classification performance only while the second stage takes into consideration the number of genes. K-nearest neighbor (K-nn) is used to evaluate the proposed approaches on ten microarray data sets. The results show that the proposed fitness functions can achieve better classification results compared with the fitness function that takes into account only the classification performance. Besides, they outperform three other wrapper-based methods in most of the cases. The second fitness function outperforms the first fitness function across most of the datasets based on classification accuracy and the number of genes.

Download


Paper Citation