Yang Mu's Research Interests

Research Interests

Machine Learning

Online Least squares optimization using Constrained Stochastic Gradient Descent (CSGD)

This paper solves the online least squares problem by adding an equality constraint and achieves the fastest convergence rate and regret bound among all first order approaches.
CSGD advantages
1. Without the strong convexity assumption, CSGD converges at the rate of $O(\log T/T)$, which is close to that of a full second order approach, while retaining time complexity of $O(d)$ in each iteration.
2. CSGD achieves the $O(\log T)$ regret bound without requiring strong convexity, which is the best regret bound among existing SGD methods.
3. The empirical loss achieved by CSGD is upper bounded by SGD at every step.

The categories of the state-of-the-art distance metric methods and their relationships to the proposed LDDM method.

The ultimate goal of distance metric learning is to incorporate abundant discriminative information to keep all data samples in the same class close and those from different classes separated. Local distance metric methods can preserve discriminative information by considering the neighborhood influence. In this paper, we propose a new local discriminative distance metrics (LDDM) algorithm to learn multiple distance metrics from each training sample (a focal sample) and in the vicinity of that focal sample (focal vicinity), to optimize local compactness and local separatiblity. Those locally learned distance metrics are used to build local classifiers which are aligned in a probabilistic framework via ensemble learning. Theoretical analysis proves the convergence rate bound, the generalization bound of the local distance metrics and the final ensemble classifier. We extensively evaluate LDDM using synthetic datasets and large benchmark UCI datasets.

LDDM can
1. accomplish all local pairwise constraints;
2. handle noisy dataset;
3. handle multimodal distribution problem;
4. have infinite VC-dimension when using any classifier on the metric space.

Computer Vision

Crater detection framework.

We present an integrated framework on auto-detection of sub-kilometer craters with boosting and transfer learning. The framework contains three key components. First, we utilize mathematical morphology to efficiently identify crater candidates, the regions of an image that can potentially contain craters. Only those regions, occupying relatively small portions of the original image, are the subjects of further processing. Second, we extract and select image texture features, in combination with supervised boosting ensemble learning algorithms, to accurately classify crater candidates into craters and non-craters. Third, we integrate transfer learning into boosting, to enhance detection performance in the regions where surface morphology differs from what is characterized by the training set. In addition, we extracted biologically inspired features to represent crater images according to human beings. These new features achieved better results than traditional edge and texture based features.

Bo Xie, Yang Mu, Dacheng Tao, Kaizhu Huang: m-SNE: Multiview Stochastic Neighbor Embedding. IEEE Transactions on Systems, Man, and Cybernetics, Part B 41(4): 1088-1096 (2011)
Bo Xie, Yang Mu, Dacheng Tao: m-SNE: Multiview Stochastic Neighbor Embedding. ICONIP (1) 2010: 338-346

Multiview Stochastic Neighbor Embedding.

We propose a multiview stochastic neighbor embedding (m-SNE) that systematically integrates heterogeneous features into a unified representation for subsequent processing based on a probabilistic framework. Compared with conventional strategies, our approach can automatically learn a combination coefficient for each view adapted to its contribution to the data embedding. This combination coefficient plays an important role in utilizing the complementary information in multiview data. Also, our algorithm for learning the combination coefficient converges at a rate of O(1/k^2), which is the optimal rate for smooth problems.

Bo Xie, Yang Mu, Mingli Song, Dacheng Tao: Random Projection Tree and Multiview Embedding for Large-Scale Image Retrieval. ICONIP (2) 2010: 641-649

Random projection tree for multiview image retrieval.

Image retrieval on large-scale datasets is challenging. Current indexing schemes, such as k-d tree, suffer from the "curse of dimensionality". In addition, there is no principled approach to integrate various features that measure multiple views of images, such as color histogram and edge directional histogram. We propose a novel retrieval system that tackles these two problems simultaneously. First, we use random projection trees to index data whose complexity only depends on the low intrinsic dimension of a dataset. Second, we apply a probabilistic multiview embedding algorithm to unify different features. Experiments on MSRA large-scale dataset demonstrate the efficiency and effectiveness of the proposed approach.

Yang Mu, Dacheng Tao: Biologically inspired feature manifold for gait recognition. Neurocomputing 73(4-6): 895-902 (2010)
Yang Mu, Dacheng Tao, Xuelong Li, Fionn Murtagh: Biologically Inspired Tensor Features. Cognitive Computation 1(4): 327-341 (2009)


Biologically inspired feature for gait recognition.	Biologically inspired feature for face recognition.

We proposed biologically inspired features gait recognition and face recognition. We extend a manifold algorithm to tensor form in order to handle the biologically inpired tensor features.

Data Mining

Yang Mu, Wei Ding, Melissa Morabito, Dacheng Tao: Empirical Discriminative Tensor Analysis for Crime Forecasting. KSEM 2011: 293-304 (2010)

The residential burglary third-order tensor example.

Police agencies have been collecting an increasing amount of information to better understand patterns in criminal activity. Recently there is a new trend in using the data collected to predict where and when crime will occur. Crime prediction is greatly beneficial because if it is done accurately, police practitioner would be able to allocate resources to the geographic areas most at risk for criminal activity and ultimately make communities safer. In this paper, we discuss a new four-order tensor representation for crime data. The tensor encodes the longitude, latitude, time, and other relevant incidents. Using the tensor data structure, we propose the Empirical Discriminative Tensor Analysis (EDTA) algorithm to obtain sufficient discriminative information while minimizing empirical risk simultaneously. We examine the algorithm on the crime data collected in one Northeastern city. EDTA demonstrates promising results compared to other existing methods in real world scenarios.