Thesis defense of Irina Nicolae
The Friday, December 2, 2016
at 4:00 PM
room F021, Laboratoire Hubert Curien,
Bâtiment F, 18 Rue du Professeur Benoît Lauras,
42000 Saint-Etienne
"Learning Similarities for Linear Classification: Theoretical Foundations and Algorithms"
Abstract
The notion of metric plays a key role in machine learning problems, such as classification, clustering and ranking. Learning metrics from training data in order to make them adapted to the task at hand has attracted a growing interest in the past years. This research field, known as metric learning, usually aims at finding the best parameters for a given metric under some constraints from the data. The learned metric is used in a machine learning algorithm in hopes of improving performance. Most of the metric learning algorithms focus on learning the parameters of Mahalanobis distances for feature vectors. Current state of the art methods scale well for datasets of significant size. Contrarily, the more complex topic of multivariate time series has received only limited attention, despite the omnipresence of this type of data in applications. An important part of the research on time series is based on the dynamic time warping (DTW) computing the optimal alignment between two time series. The current state of metric learning suffers from some significant limitations which we aim to address in this thesis. The most important one is probably the lack of theoretical guarantees for the learned metric and its performance for classification. The theory of (ε,γ,τ)-good similarity functions has been one of the first results relating the properties of a similarity to its classification performance. A second limitation in metric learning comes from the fact that most methods work with metrics that enforce distance properties, which are computationally expensive and often not justified. In this thesis, we address these limitations through two main contributions. The first one is a novel general framework for jointly learning a similarity function and a linear classifier. This formulation is inspired from the (ε,γ,τ)-good theory, providing a link between the similarity and the linear classifier, and is convex for a broad range of similarity functions and regularizers. We derive two equivalent generalization bounds through the frameworks of algorithmic robustness and uniform convergence using the Rademacher complexity, proving the good theoretical properties of our framework. Our second contribution is a method for learning similarity functions based on DTW for multivariate time series classification. The formulation is convex and makes use of the (ε,γ,τ)-good framework for relating the performance of the metric to that of its associated linear classifier. Using uniform stability arguments, we prove the consistency of the learned similarity leading to the derivation of a generalization bound.
Jury:
- Maria-Florina Balcan, Assistant Professor at Carnegie Mellon University, Examiner
- Antoine Cornuéjols, Professor at AgroParisTech, Reviewer
- Ludovic Denoyer, Professor at Université Pierre et Marie Curie, Reviewer
- Éric Gaussier, Professor at the University of Grenoble-Alpes, Co-director
- Amaury Habrard, Professor at the University of Saint-Etienne, Examinor
- Liva Ralaivola, Professor at the University of Aix-Marseille, Examiner
- Marc Sebban, Professor at the University of Saint-Etienne, Director
The defense will be done in English.