Fall Seminar Series  September 27, 2007
University of Minnesota
School of Statistics
College of Liberal Arts

Penalized Clustering of Large Scale Functional Data with Multiple Covariates

Ping Ma
Department of Statistics
University of Illinois at Champaign

Thursday, September 27, 2007
3:30 PM, 115 Ford Hall
Minneapolis, East Bank Campus
Social at 3:00 PM, 300 Ford Hall


Abstract

Large scale longitudinal data with repeated measurements over a number of time points rise from many scientific investigations. A typical example is that of temporal gene expression studies, in which a series of micorarray experiments are conducted sequentially during a biological process. At each time point, mRNA expression levels of thousands of genes are measured simultaneously. Collected over time, a gene's ``temporal expression profile'' gives the scientist some clues on what role this gene might play. A group of genes with similar profiles are often "co-regulated" or participants of a common and important biological function. Many clustering techniques have thus been applied to reveal the cluster information, which is a crucial first step to decipher the underlying mechanism.

In addition to the time factor, such longitudinal data often contain many covariates, e.g. replicates at each time point, species in comparative genomics studies, and treatment groups in case-control studies, as well as many factors in factorial designed experiments.
However, very few current available clustering methods take into account all these factors. Moreover, the computational costs of these methods are very expensive for large scale data.

To overcome these obstacles, we propose a penalized clustering method for large scale data with multiple covariates using functional data approach.  Simulation studies and read-data examples are presented to investigate the empirical performance of proposed method.