Abstracts
Mikhail Belkin, Learning Probability Distributions with Eigenfunctions of Convolutions Operators
-
I will discuss how spectral properties of density-dependent convolution
operators can be used to learn parameters of families of distributions,
particularly mixtures of Gaussians. This analysis
leads to new algorithms for learning such mixtures and sheds some new
light on some properties of spectral clustering and Kernel Principal
Components analysis.
Joint work with Tao Shi and Bin Yu.
Lawrence Brown, In-Season Prediction of Batting Averages: A Field-test of Basic Empirical Bayes and Bayes Methodologies
-
The methodological purpose of this study is to gain experience with a
variety of predictive methods applicable to a wide range of commonly
occurring situations. Several of the methods to be investigated derive from
empirical Bayes and hierarchical Bayes interpretations. Although the general
ideas behind these techniques have been understood for many decades, some of
these methods have only been refined relatively recently in a manner that
promises to more accurately fit data such as that at hand.
These methods will be investigated in the context of prediction of baseball batting averages. Batting average is one of the principle performance measures for an individual baseball player. It has a simple numerical structure as the percentage of successful attempts, "Hits", as a proportion of the total number of qualifying attempts, "At-Bats". This situation makes it natural to statistically model each player's batting average across a season as the outcome of an independent binomial random variable, with a possibly different value of pi for each player. This is a common data structure in many statistical applications; and so the methodological study here has implications for such a range of applications.
One feature of all of the statistical methodologies here is the preliminary use of a particular form of variance stabilizing transformation in order to transform the binomial data problem into a somewhat more familiar structure involving (approximately) Normal random variables with known variances. This transformation technique is also useful in validating the binomial model assumption that is the conceptual basis for all our analyses.
No previous knowledge of baseball is required. But you might want to recall that the Olympic baseball competition is only 2 months and 8 kilometers away.
An Chen, A Web Mining Based Measurement and Monitoring Model of Urban Mass Panic in Emergency Management
- It is very important to discover the mass panic when city is in emergency. A traditional approach is to do a survey among urban mass, which would cost much time and money. In our opinion, a more suitable and effective approach is to do web-based investigation which includes discovery, measurement, and monitoring. A framework including information retrieval, data mining, and area knowledge analysis is presented. Great efforts have been put to the urban mass panic measurement model in terms of over 10 indicators which have covered most relevant websites, consisting of portal, forums, BBS and other interactive websites. Experimental results are discussed at the end of the talk.
Jianqing Fan, Covariance Learning
-
High dimensionality comparable to the sample size is a common
feature in portfolio allocation, risk management, genetic network
and climatology. In this talk, we first use a multi-factor model to
reduce the dimensionality and to estimate the covariance matrix for
portfolio allocation and risk assessment. The impacts of
dimensionality on the estimation of covariance matrix and its
inverse are examined. We identify the situations under which the
factor approach can gain substantially the performance and the cases
where the gains are only marginal, in comparison with the sample
covariance matrix. Furthermore, the impacts of the covariance matrix
estimation on portfolio allocation and risk management are studied.
Viable covariance modeling and sparse and robust portfolio
allocations are recommended based on our mathematical results.
In other class of problems such as genetic network or climatology, sparsity of the covariance matrix or its inverse arises naturally. We then estimate high-dimensional covariance matrices using the penalized likelihood method to explore the sparsity. New algorithms are proposed. Optimal rates of convergence, sparsistency, and asymptotic normality are established. Our theoretical results are verified by simulation studies and illustrated by several applications.
Jerome H. Friedman, Fast Sparse Regression and Classification
- Regularized regression and classification methods fit a linear model to data, based on some loss criterion, subject to a constraint on the coefficient values. As special cases, ridge-regression, the lasso, and subset selection all use squared-error loss with different particular constraint choices. For large problems the general choice of loss/constraint combinations is usually limited by the computation required to obtain the corresponding solution estimates, especially when non convex constraints are used to induce very sparse solutions. A fast algorithm is presented that produces solutions that closely approximate those for any convex loss and a wide variety of convex and non convex constraints, permitting application to very large problems. The benefits of this generality are illustrated by examples.
Baogang Hu, Chinese Academy of Sciences, China
- This talk will discuss the maximum uses of prior information in studies of machine learning, and their related issues. When we recognize a critical role of prior information in modeling, up to now, it seems that we still miss a systematic investigation into the subject. For example, in apart from Bayesian framework or knowledge-based inference, do we need other generic approaches, which are able to integrate any type of prior information? For addressing this issue, we propose a generalized constraint modeling approach. Using this approach, one can improve neural networks through maximum uses of prior information. Examples are given on regression and dynamic process problems. The main objective of this talk is to highlight the issues for a systematic study on the subject from a mathematic framework, rather than from specific applications.
David Madigan, Data Mining Issues in Drug Development
- Data mining methods play an increasingly important role in drug safety. Prior to approval, clinical trial data can provide important insights into potentially unforeseen drug side effects. Following approval, important data sources include spontaneous report databases, claims databases, and medical record systems. Challenging statistical issues arise in all of these contexts. This talk will review the general area focusing especially on flaws in the current standards for assessing drug safety during the approval process and on recent developments in statistical methodology for monitoring post-approval safety in longitudinal medical claims databases.
Baogang Hu, Issues about Embedding Prior Information onto Learning Marchines - Example on Neural Network Models
- This talk will discuss the maximum uses of prior information in studies of machine learning, and their related issues. When we recognize a critical role of prior information in modeling, up to now, it seems that we still miss a systematic investigation into the subject. For example, in apart from Bayesian framework or knowledge-based inference, do we need other generic approaches, which are able to integrate any type of prior information? For addressing this issue, we propose a generalized constraint modeling approach. Using this approach, one can improve neural networks through maximum uses of prior information. Examples are given on regression and dynamic process problems. The main objective of this talk is to highlight the issues for a systematic study on the subject from a mathematical framework, rather than from specific applications.
Iain Johnstone, Approximate Null Distribution for the Largest Latent Root in Multivariate Analysis
-
The greatest root distribution lies at the heart of multivariate
analysis. It describes the null hypothesis distribution for the
union-intersection test for many classical problems, including
multiple response linear regression, MANOVA, canonical correlations,
equality of covariance matrices and so on. It is thus at least
potentially relevant to a variety of settings in data mining and
machine learning. However, the exact null distribution is difficult
to calculate and work with, and so the use of extensive tables or
special purpose software has always been necessary.
This talk proposes a simple asymptotic approximation, based on the Tracy Widom distribution. A fortunate surprise is that the approximation is not solely asymptotic; we also argue that it is reasonably accurate over the entire range of (non-asymptotic) values of the parameters. "Reasonably accurate" means, for example, less than ten percent relative error in the 95th percentile, even when working with two variables and any combination of error and hypothesis degrees of freedom.
Yongdai Kim, Incentive Sparse Estimator
- Lasso is a penalized empirical risk minimization method which yields a sparse solution. It has nice (asymptotic) properties for high dimensional data when the true model is sparse. However, when the predictive variables are highly correlated, a less sparse estimator than the Lasso estimator would be optimal. To make a less sparse estimator, Zou and Hastie (2005) proposed the elastic net which compromise the lasso and ridge estimators. Also, see Friedman and Popescu (2005). In this talk, we propose a new regularized estimator called ``incentive sparse estimator'' which yields a less sparse solution than the Lasso estimator. The proposed estimator has several advantages over the elastic net. First, the incentive sparse estimator extends the elastic net for general loss functions other than the square error loss. In particular, the incentive sparse estimator does not require post-hoc rescaling as the elastic net does. Second, it can produce more diverse estimators as the regularization parameters vary. For the elastic net, we can say that the estimator locates in between the lasso and univariate soft thresholding estimators. However, in certain cases where predictors are highly correlated and the signal-to-noise ratio is low, the simple average of the predictor would be better than the ridge estimator. Thee proposed method can have the simple average as an extreme estimator. Since the other extreme estimator of the awarded lasso is the lasso estimator, we can say that the proposed estimator locates in between the simple average and lasso estimator.
Ann B. Lee, Finding Low-dimensional Structure by Spectral Connectivity Analysis
- For naturally occurring data, the dimension of the given input space is often very large while the data themselves have a low intrinsic dimensionality. Spectral kernel methods are non-linear techniques for transforming data into a coordinate system that efficiently reveal the geometric structure -- in particular, the "connectivity" -- of the data. In this talk, I will focus on one particular technique -- diffusion maps -- but the analysis extends to other techniques as well. I will give examples of various applications of the method in dimensionality reduction, data set parameterization and clustering. I will also present recent results on how spectral kernel methods relate to classical kernel smoothing. (Part of this work is joint with R.R. Coifman, S. Lafon and L. Wasserman.)
Yoon Lee, Linear Programming for Feature Selection via Methods of Regularization
- We consider statistical procedures for feature selection defined by a family of regularization problems with convex piecewise linear loss functions and penalties of l_1 nature. Many known statistical procedures (e.g. quantile regression and support vector machines with l_1 norm penalty) are subsumed under this category. Computationally, the regularization problems are linear programming (LP) problems indexed by a single parameter, which are known as 'parametric cost LP' or 'parametric right-hand-side LP' in the optimization theory. Exploiting the connection with the LP theory, we lay out general algorithms, namely, the simplex algorithm and its variant for generating regularized solution paths for the feature selection problems. The significance of such algorithms is that they allow a complete exploration of the model space along the paths and provide a broad view of persistent features in the data. The implications of the general path-finding algorithms are outlined for a few statistical procedures, and they are illustrated with numerical examples.
Tze Leung Lai, a Consistent Model Selection Criterion for L2 Boosting in High-dimensional Sparse Linear Models
Hang Li, Learning to Rank - Problem, Challenge, and Opportunity
- Learning to rank is a task that automatically learns a ranking model (function) using training data, such that the model can sort objects according to their degrees of relevance, preference, or importance defined in a specific application. Learning to rank has been receiving keen and growing interest in machine learning, data mining, information retrieval, and other fields in recent years, because of its importance, novelty, and far-reaching implication. In this talk, I will give a survey on this emerging research area and introduce some of the recent work which we have done at Microsoft Research Asia.
Hongzhe Li, Network-constrained Regularization and Variable Selection for Analysis of Genomic Data
- Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this paper, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L1-norm of the coefficients but encourages smoothness of the coefficients on the network. Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene expression dataset identified several subnetworks on several of the KEGG transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures.
Feng Liang, Local Sliced Inverse Regression
- The importance of dimension reduction for predictive modeling and visualization plays a central role in statistical graphics and computation. This interest has been revived recently in the machine learning literature in the context of manifold learning algorithms. In this talk, I'll present a new dimension reduction algorithm for regression/classification models. Our algorithm is developed upon the sliced inverse regression model with a utilization of the local manifold structure of the data. The utility of our algorithm with respect to predictive accuracy as well as exploratory data analysis via visualization will be demonstrated on a variety of simulated and real data.
Lijun Liu, Adaptive Learning Algorithm for Principal Component Analysis with Data Dependant Learning Rate
- Principal component analysis (PCA) is a widely used statistical technique in various applications such as feature extraction in pattern recognition, data compression and coding in signal processing. Neural network approaches for PCA are most applicable to those applications with changing environment, where the learning process has to be repeated in on-line manner. Since Oja's pioneer founding that a simple linear neuron with a constrained Hebbian learning rule can extract the principal component, there were increasing interests in the study of connections between PCA and neural networks. We propose a simple adaptive learning algorithm for PCA in this report, where the learning process has to be repeated in on-line manner. Compared to most existing neural network based adaptive learning algorithm, the proposed approach can compute eigenvector as well as the eigenvalue adaptively while using only the linear output of the single linear neuron. Similar to the recursive least squares learning (RLS), the learning rate is adjusted adaptively according to the input data. Numerical experiment shows that the data dependent step size in the proposed algorithm offers significant advantages over that of constant learning algorithm.
Yufeng Liu, Estimating Spatial Covariance using Penalized Likelihood with Weighted L1 Penalty
-
In spatial statistics, estimation of large covariance matrices is of great
importance because of its role in spatial prediction and design. The
traditional approaches typically assume that the spatial process is
stationary and the covariance function takes some well known parametric
form, and estimate the parameters of the covariance functions using
likelihood based methods. In this talk, I will present a nonparametric
approach to estimate the covariance matrix for spatial nonstationary
Gaussian Markov random field models. By exploiting the sparsity structure
in the inverse covariance matrix, we show that a LASSO-type of approach
gives improved covariance estimators measured by several criteria.
Both simulated and real examples show that the proposed method performs
competitively.
This is joint work with Zhengyuan Zhu at UNC-CH.
Xiao-ling Lu, Mining E-Commerce Customers' Online Purchasing Behavior
- Nowadays more and more consumers choose e-commerce as one of the channels when they purchase products. It is quite important for the success of the business to analyze the characteristics and purchasing decision of the customers. Using KDD Cup 2000 dataset, this paper studies the Gazelle.com customers' demographics and online behaviors. Basket Analysis in data mining is applied to examine the association between products and predictive models are used to predict the consumers' loyalty. All analysis results are very useful for Gazelle.com to make decision on proper promotions to make more profits. The methods proposed in the paper are also useful for other e-commerce companies to analyze their customers.
Jianwei Ma, Data Recovery for Compressed Measurement
- Surface metrology is a science of measuring small-scale features on surfaces and charac- terizing the surface geometry or topography ranging from nano-scale to micro-scale. In this paper, we proposed a compressed measurement for surface metrology by solving a convex opti- mal problem with sparse constrained by curvelet transform and wave atom transform, inspired by a so-called compressed sensing by mathematicians to deal with undetermined systems. One can recover and analyze the surfaces from an incomplete measurement or far fewer measurements than traditional methods use, while does not obey the Shannon sampling theorem: the sampling rate must be at least twice the maximum frequency of signal. The compressed measurement es- sentially shift measurement cost to computational cost of off-line nonlinear recovery. Different from traditional direct and indirect measuring method, the compressed measurement directly senses the geometric and structural features instead of single pixel's information, which indicates a new acquisition protocol, i.e., a potential design for new measurement instrument.
Jinwen Ma, Adaptive Model Selection on Finite Mixture
- In data modeling and analysis, finite mixture is widely used. However, the selection of number of components in the mixture for a sample data set is still a rather difficult task. In order to overcome it, many criteria have been proposed to determine the best number of components or clusters in the sample data. Since the number of components is just a scale of the mixture model, its selection is usually referred to as model selection. Recently, some adaptive model selection learning mechanisms for finite mixture modeling, especially for Gaussian mixture modeling, has been developed such that model selection can be made automatically during parameter learning on the sample data, which provide a new perspective for data modeling and analysis. In this talk, we survey some main results of adaptive model selection on Gaussian mixture or general finite mixture. First, we summarize some automated learning algorithms on Gaussian or finite mixture based on the Bayesian Ying-Yang (BYY) harmony learning principle as well as the entropy penalization. We then describe some incremental model selection learning algorithms on Gaussian mixture. Furthermore, we describe a dynamic model selection learning algorithm on Gaussian mixture. Finally, we present some typical practical applications of these adaptive model selection learning algorithms.
Ping Ma, Statistical Journey to the Center of the Earth
- At a depth of ~2890 km, the core-mantle boundary (CMB) separates turbulent flow of liquid metals in the outer core from slowly convecting, highly viscous mantle silicates. The CMB marks the most dramatic change in dynamic processes and material properties in our planet, and accurate images of the structure at or near the CMB -- over large areas -- are crucially important for our understanding of present day geodynamical processes and the thermo-chemical structure and history of the mantle and mantle-core system. In addition to mapping the CMB we need to know if other structures exist directly above or below it, what they look like, and what they mean (in terms of physical and chemical material properties and geodynamical processes). Detection, imaging, (multi-scale) characterization, and understanding of structure (e.g., interfaces) in this remote region have been -- and are likely to remain -- a frontier in cross-disciplinary geophysics research. We will discuss the statistical problems and challenges in imaging the CMB through generalized Radon transform.
Shuangge Ma, Variable Selection with Clustering Regularization
- High dimensional data are frequently encountered in economic, biological, and medical studies. Analysis of such data can be challenging because of the high dimensionality and the presence of cluster structure in covariates. Here the cluster structure can be defined scientifically or statistically. We propose a clustering regularized method, which can carry out simultaneous cluster-selection and within-cluster selection. Computational algorithms and statistical properties of the proposed method are investigated. Extensive numerical studies are employed to assess finite sample properties. This study is joint with Dr. Jian Huang, University of Iowa.
Zhi-Ming Ma, Two-layer Statistical Learning
- Based on a recent joint work with co-authors Yanyan Lan, Tie-Yan Liu, Tao-Qin, Zhiming Ma and Hang Li, in this talk we propose a new framework of statistical learning model, in which the training data are composed in two layers. As can be seen from the case studies of learning to rank in Information Retrival, the two layer structure of training data is not artificial, but arises from the real world. The challenge is that when dealing two layer training data, most of the existing results of statistical learning can not be directly applied. Thus we have to suit the new model. In this aspect much research should be investigated. We shall explore some of our results conducted in this direction.
Marina Meila, Consensus finding, exponential models, and infinite rankings
-
This talk is concerned with summarizing -- by means of statistical
models -- of data that expresses preferences. This data is typically a
set of rankings of n items by a panel of experts; the simplest summary
is the "consensus ranking", or the "centroid" of the set of
rankings. Such problems appear in many tasks, ranging from combining
voter preferences to boosting of search engines.
We study the problem in its more general form of estimating a parametric model over permutations, known as the Generalized Mallows (GM) model. I will present an exact estimation algorithm, non-polynomial in theory, but extremely effective in comparison with existing algorithms. From a statistical point of view, we show that the GM model is an exponential family, and introduce the conjugate prior for this model class.
Then we introduce the infinite GM model, corresponding to "rankings" over an infinite set of items, and show that this model is both elegant and of practical significance. Finally, the talk will touch upon the subject of multimodal distributions and clustering.
Joint work with: Bhushan Mandhani, Le Bao, Kapil Phadnis, Arthur Patterson and Jeff Bilmes
George Michailidis, Semi-supervised Learning with Additive Models
- In this talk, we consider the problem of transductive learning with additive models for data, whose attributes can naturally be partitioned into groups called views. An important case is when some of the views consist of information given in the form of a graph. Further, the response variable can be partitioned into a labeled (observed) set and an unlabeled one. We propose a general iterative algorithm which extends any supervised learner into the semi-supervised setting to fit such models. We also examine the view selection issue through a modified AIC criterion. We illustrate the proposed methodology on both synthetic and real data sets from pharmacology and text analysis.
Wei Pan, Networked Predictors in Penalized Regression with Application to Microarray Data
-
We consider penalized linear regression, especially
for ``large $p$, small $n$" problems, for which the relationships among
predictors are described {\em a priori} by a network.
A class of motivating examples is to model a response using
gene expression profiles while accounting for coordinated functioning
of genes in the form of biological pathways or networks.
To incorporate the prior knowledge of networks about predictors,
we propose a grouped penalty based on the $L_\gamma$-norm that
smoothes the regression coefficients over a network.
The main feature of the proposed method is its ability to automatically
realize grouped variable selection and recognize grouping effects.
We also discuss the effects of the choices of $\gamma$ and weights.
Simulation studies demonstrated superior
finite sample performance of the proposed method as compared to
Lasso (Tibshirani 1996), elastic net (Zoua nd Hastie 2005)
and a recently proposed network-based method (Li and Li 2008):
our method worked best in variable selection
across all simulation set-ups. For illustration,
the method was applied to a microarray dataset to predict
survival time from diagnosis for
brain cancer patients using gene expression profiles
and a gene network compiled from KEGG pathways.
This is joint work with Banhuai Xie and Xiaotong Shen.
Annie Qu, Selecting Informative Correlation Structure in Multiple Sourced Correlation Data
- In the generalized method of moments approach for longitudinal data analysis, unbiased estimating functions can be constructed to incorporate both the marginal mean and the correlation structure of the data. Increasing the number of parameters in the correlation structure corresponds to increasing the number of estimating functions. Thus, building a correlation model is equivalent to selecting informative estimating functions. This paper proposes a chi-squared test to choose informative unbiased estimating functions. We show that this methodology is useful for identifying which source of correlation it is important to incorporate when there are multiple possible sources of correlation. This is joint work with J. Jack Lee and Bruce Lindsay.
Lei Shi, Outlier Mining in Hierarchical Multilevel Data
- Outlier detection is an important issue in data mining area. In the current practice of detecting outliers in linear models with unknown covariance structure, the effect of estimating the unknown parameters in the covariance matrix on the test is usually ignored. This paper uses the mean-shift outlier model to detect outliers for multilevel models. We present an approximate test which includes the influence of estimating the parameters in the covariance matrix to study the outlier mining in hierarchical multilevel data. Approximate formulae for outlier detection in estimating both fixed and random parameters under the mean-shift outlier model are derived, and a test for multiple outliers is proposed. These results can be used to detect outlier units at any levels. Detection of outlier units related to random parts is also studied. Analysis of an example shows that the proposed method is effective in identifying outliers in multilevel models.
Xingwei Tong, Variable Selection for Panel Count Data via Nonconcave Penalized Estimating Function
- Variable selection is an important issue in all regression analysis and in this paper, we discuss this in the context of regression analysis of panel count data. Panel count data often occur in long term studies that concern occurrence rate of a recurrent event and their analysis has recently attracted a great deal of attention. However, it does not seem to exist any established approach for the variable selection with respect to panel count data. For the problem, we adopt the idea behind the nonconcave penalized likelihood approach proposed in Fan and Li (2001) and develop a nonconcave penalized estimating function approach. The proposed approach selects variables and estimates regression coefficients simultaneously and an algorithm is presented for this process. We show that the proposed approach performs as well as the oracle procedure in that it yields the estimates as if the correct submodel were known. Simulation studies are conducted for assessing the performance of the proposed approach and suggest that it works well for practical situations. An illustrative example from a cancer study is provided.
Hansheng Wang, Kernel based Sliced Regression for Dimension Reduction with Application in Earnings Pattern Minning
- By slicing the region of the response (Li, 1991) and applying local kernel regression (MAVE, Xia, et al, 2002) to each slice, a new dimension reduction method is proposed. Compared with the traditional inverse regression methods, e.g. sliced inverse regression (Li, 1991), the new method is free of the linearity condition (Li, 1991) and enjoys much improved estimation accuracy. Compared with the direct estimation methods (e.g., MAVE), the new method is much more robust against extreme values and can capture the entire central subspace (Cook, 1998) exhaustively. To determine the CS dimension, a consistent cross-validation (CV) criterion is developed. Extensive simulation studies are conducted to demonstrate the usefulness of the proposed method. In addition to that, the application of our method in earnings pattern minning in China stock market is also demonstrated.
Xing Wang, Properties of Lasso estimators in Generalized linear model
- Generalized linear model has been widely used in traditional statistics for many years, and has received extensive study in machine learning recently due to its sparse classification and feature selection advantages.In this talk , I will report recent advances lasso estimators in Generalized linear model, including asymptotic and monotone LASSO conditions. We also present the several competitors with LASSO for feature selection through simulation experiments. Finally, some proposed method is illustrated with real examples.
Yuhong Yang, Model Combination for Quantile Regression
- Model selection for quantile regression is often a challenging problem. In addition to the well-known general difficulty of model selection uncertainty, when quantiles at multiple probability levels are of interest, typically a single candidate does not serve all of them well simultaneously. In this talk we propose methods to combine quantile estimators. Oracle inequalities show that at each given probability level, the combined estimators automatically perform nearly as well as the best candidate. Simulation and real examples show that the proposed model combination approach often leads to a substantial gain in accuracy under global measures of performance. (The talk is based on joint work with Kejia Shan.)
Limin Yao, The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification
- Support Vector Machines (SVMs) is aimed at finding an optimal separating hyper-plane that maximally separates the two classes of training examples (more precisely, maximizes the margin between the two classes of examples). The hyper-plane, corresponding to a classifier, is obtained from the solution of a problem of quadratic programming that depends on a cost parameter. The choice of the cost parameter can be critical. However, in conventional implementations of SVMs, it is usually supplied by the user or set as a default value. In this paper, we study how the cost parameter determines the hyper-plane. We especially focus on the case of classification using only positive and unlabeled data. We propose an algorithm that can fit the entire solution path by choosing the 'best' cost parameter while training SVM models. We compare the performance of the proposed algorithm with the conventional implementations that use default values as the cost parameter on two synthetic data sets and two real-world data sets. Experimental results show that the proposed algorithm can achieve better results when dealing with positive and unlabeled classification.
Jieping Ye, Computational Analysis of Drosophila Gene Expreseion Pattern Images
- Gene expression in a developing embryo occurs in particular cells (spatial patterns) in a time-specific manner (temporal patterns), which leads to the differentiation of cell fates. Images of a Drosophila melanogaster embryo at a given developmental stage, showing a particular gene expression pattern fates. Images of a Drosophila melanogaster embryo at a given developmental stage, showing a particular gene expression pattern revealed by a gene-specific probe, can be compared for spatial overlaps. The comparison is fundamentally important to formulating and testing gene interaction hypotheses. In this talk, I will present our recent developments in automatic image annotation and overlapping expression pattern identification using machine learning techniques.
Chengshui Zhang, Graph Based Semi-supervised Learning
- Semi-supervised learning is one of the most important research area in machine learning community, among which graph based methods have been becoming one of the most active research topics. In this talk we will present some recent works we have worked towards the direction of graph based semi-supervised learning (GBSSL) including: (1) linear neighborhood propagation; (2) fast multilevel graph transduction; (3) GBSSL using local and global regularization; (4) electro-magnetic field models for GBSSL. The first three works aims at improving traditional GBSSL methods, while the last one provides a new model for GBSSL. Finally we will conclude these works and give some future research directions.
Cun-Hui Zhang, Information-Theoretical Optimality Of Variable Selection With Minimax Concave Penalty
- We prove that the MC+, a variable selector we proposed earlier, is optimal in the sense that the amount of information it requires for consistent variable selection in the linear regression model is of the same order as the minimum possible under mild conditions on deterministic or random design matrices. A similar result has been recently proved for the LASSO when the design matrix has iid normal entries, but due to the estimation bias, the LASSO does not enjoy this optimality property without two restrictive assumptions. Simulation results are reported to demonstrate the superiority of the MC+ in variable selection and its competitiveness in computational efficiency, compared with the LASSO and SCAD selectors.
Shichao Zhang, Cost-sensitive Classification with Deficient Labeled Data
- Existing cost-sensitive learning techniques work well when there are sufficient labeled data in training datasets. It is undeveloped to learn cost-sensitive models from those datasets that are with few labeled data and plentiful unlabeled data. There are great many classification applications confronting the problem of deficient labeled data, for example, medical diagnosis and text classification applications. This is because labeled data are often very difficult, time consuming or expensive to obtain. To circumvent this challenging issue, in this paper we apply semi-supervised learning techniques to learn cost-sensitive models from datasets with inadequate labeled data. We propose three classification strategies for learning cost-sensitive classifier from training datasets with both labeled and unlabeled data, based on Expectation Maximization (EM) and Self-training. The first method, Direct-EM uses EM to build a semi-supervised classifier, then directly compute the optimal class label for each test example using the class probability produced by the learning model. The second method, CS-EM modifies EM by incorporating misclassification costs into the probability estimation process. The third method, CS-ST incorporates misclassification costs into the semi-supervised self-training process. Our experiments show that when using only 50 labeled training examples, CS-EM outperforms the other competing methods on three bench mark text data sets across different cost ratios. However, when we increased the number of labeled training examples to 100 (about 10% of the total data sets), we studied the three strategies on varied data and cost ratio setting.
Tian Zheng, Feature Selection and Classification based on k-Nearest-neighbor Patterns
- High dimensional data such that of gene expression, have provided vast amounts of information for scientific research and learning. However, in most cases, the information is much diluted by noises from non-informative features. Feature selection has become a necessary step for learning in high dimensions. It is also widely acknowledged that feature selection and classification methods for such data should consider possible interactions among the features, since they may carry stronger signals and reveal important scientific findings (such as gene-gene interactions). In this paper,we developed an information score measuring the information content in a feature subspace, using local neighborhood patterns (k nearest neighbor patterns). A backward elimination algorithm based on random subspaces is carried out to identify the best feature subspaces according to this score. A classifier using the selected subspaces, also based on neighborhood patterns is further proposed. Through simulation and two real gene expression data examples --- breast cancer data and prostate cancer data, our method demonstrates power in identifying patterns that are informative about the class difference, not only in low dimensional feature subspaces but also in some high order interactions. As a result, our method outperforms both SVM and Golub’s weighted voting method in cancer classification. Moreover, our method is completely nonparametric and can be applied to a wide variety of problems, not limited to gene expression data.
Harrison Zhou, Model Selection and Sharp Asymptotic Minimaxity
- We will show that a class of model selection procedures are asymptotically sharp minimax to recover sparse signals over a wide range of parameter spaces. Connections to Bayesian model selection, the MDL principle and wavelet estimation will be discussed.
Ji Zhu, Partial Correlation Estimation by Joint Sparse Regression Models
- In this talk, we present a computationally efficient approach for selecting non-zero partial correlations under the high-dimension-low-sample-size setting. This method assumes the overall sparsity of the partial correlation matrix and employs sparse regression techniques for model fitting. We illustrate the performance of our method by extensive simulation studies. It is shown that our method performs well in both non-zero partial correlation selection and the identification of hub variables, and also outperforms two existing methods. We then apply our method to a microarray breast cancer data set and identify a set of hub genes which may provide important insights on genetic regulatory networks. Finally, we prove that, under a set of suitable assumptions, the proposed procedure is asymptotically consistent in terms of model selection and parameter estimation.