"U-Statistics in Machine Learning: Large-Scale Minimization and Decentralized Estimation", by Aurélien Bellet
The Thursday, May 26, 2016
at 02:00 PM
room F021a,
Building F,
Laboratoire Hubert Curien,
18 Rue Professeur Benoît Lauras,
42000 Saint-Etienne
Seminar by Aurélien Bellet, CR INRIA in Lille
Many useful empirical statistics, such as the sample variance
and the Area Under the Curve (AUC), are computed by averaging over all d-tuples of observations. These are known as U-statistics, and are also used as risk measures in many machine learning problems such as ranking, metric learning and clustering. I will first describe some contributions on scaling up the minimization of such risk functionals to large datasets using sampling and stochastic optimization. In a second part, I will present a gossip algorithm for estimating such statistics in a
decentralized network, where each agent holds a subset of the dataset.