Improving Classifcation of Cancer and Mining Biomarkers from Gene Expression Profles Using Hybrid Optimization Algorithms and Fuzzy Support Vector Machine

Niloofar Yousefi Moteghaed, Keivan Maghooli, Masoud Garshasbi


Gene expression data are characteristically high dimensional with a small sample size in contrast
to the feature size and variability inherent in biological processes that contribute to difculties in
analysis. Selection of highly discriminative features decreases the computational cost and complexity
of the classifer and improves its reliability for prediction of a new class of samples. The present
study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy
support vector machine (SVM) as the classifer. Fuzzy logic is used to infer the importance of each
sample in the training phase and decrease the outlier sensitivity of the system to increase the ability
to generalize the classifer. A decision‑tree algorithm was applied to the most frequent genes to
develop a set of rules for each type of cancer. This improved the abilities of the algorithm by fnding
the best parameters for the classifer during the training phase without the need for trial‑and‑error
by the user. The proposed approach was tested on four benchmark gene expression profles. Good
results have been demonstrated for the proposed algorithm. The classifcation accuracy for leukemia
data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the
best kernel used in training the SVM classifer is the radial basis function. The experimental results
show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most
informative gene subset, and improve classifcation accuracy using the optimal parameters of the
classifer with no user interface.


  • There are currently no refbacks.