|Date:||Thursday, November 20, 2014|
It has been known that earlier detection of diseases (e.g., cancer) is the key for treatments. Traditionally, clinical information-based classifier usually has low prediction accuracy. It has been expected that molecular classifier is one of the most promising tools to improve the accuracy. To do this, it is vital to identify clinically-significant while biological function relevant protein/gene biomarkers. However, although many genomes have been sequenced, almost half of genes in these sequenced genomes have no function information.
To solve the issues, we developed network-based machine learning frameworks. For predicting sample disease status, we proposed to identify subnetwork-based biomarkers from co-expression network and developed a modular-based linear discriminant analysis approach by integrating ‘essential’ correlation structure among genes into the predictor rather than considering all types of correlations (e.g., strong, weak and noise correlations) or ignoring all these correlations. Hence, the correlated gene clusters, which are related to the diagnostic classes we look for, can have potential functional interpretation. For predicting protein functions, we devised an iterative relaxation labeling procedure to ﬁnd its maximally likely labeling on protein network. Contrary to the traditional methods, which treated gene ontology (GO) terms as a flat structure, we addressed the problem of multi-label multi-class classiﬁcation of protein functions by taking into account the inter-correlation of GO terms in a hierarchy structure.