Biological Sciences Seminars

Dimensionality Reduction and Data Integration Problems: Applications in Life Sciences

by Dr. Minta Thomas (KU Leuven, Belgium)

Tuesday, December 2, 2014 from to (Asia/Kolkata)
at B-333 (DBS Seminar Room)
Description
The problems of high dimensionality and heterogeneity of data always raise lot of challenges in computational biology/chemistry. Considering the possibility of increasing size and complexity of life sciences data sets in future, dimensionality reduction and nonlinear techniques have its own significance. In this decade, data integration becomes an active area of research in the field of machine learning, bioinformatics and chemoinformatics.

 

Till now, several dimensionality reduction and data integration methods are available for analyzing and classifying biological data. In the first part of presentation, we concentrate on dimensionality reduction techniques such as GEVD, Robust PCA. Initially we discuss a new mathematical framework, maximum likelihood estimation of generalized eigenvalue decomposition (MLGEVD) that employs a well known technique relying on the generalization of singular value decomposition (SVD). Then we present generalized eigenvalue decomposition (GEVD) in terms of ordinary eigenvalue decomposition (EVD) for the integration of microarray and literature information. Finally, we have applied MLGEVD and GEVD on colon cancer data set for the identification of differentially expressed genes.

 

In the second part, we study a data-driven bandwidth selection criterion for KPCA, which is a non-linear dimensionality reduction technique and then discuss its applications in bioinformatics.

In the third part, we theoretically investigate a machine learning approach, weighted LS-SVM classifier to integrate two data sources, which offers a single mathematical framework for data integration and classification problems. We design an expression value weighted clinical classifier for predicting breast cancer.

 

Finally, we propose a new machine learning approach for the identification of biofilm inhibitors of Salmonella typhimurium and Pseudomonas aeruginosa. In this study, we intend to derive a new chemical descriptor from the connection-table of chemical compounds, allowing a better distinction between biologically active and inactive compounds.

 

In short, we introduce several algorithms for dimensionality reduction and data integration problems and discuss its applications in life sciences.