PhD defense of Sayan Kumar Chaki

Thursday, November 6, 2025

at 9:00 AM

Auditorium J020

Télécom Saint-Etienne

25 rue du Dr Rémy Annino

42000 Saint-Etienne

« Equivariance in Vision for Unsupervised Low Data Regimes »

Abstract

This thesis addresses the key issue of creating robust vision models that learn efficient representations in unsupervised, low-data regimes and are geometrically consistent under spatial transformations. Conventional convolutional neural networks only have a nascent augmentation, with large parameter spaces to ensure robustness in transformations like rotation and scaling. Even then they are highly complicit to aliasing issues, group convolutions however provide a more principled way to preserve equivariance. However, they are subject to severe computational impediments in autoencoding settings for downstream tasks, owing to dimensional expansions caused by lifting operations, memory-based limitations, as well as pooling operations that can often break equivariance. In our work we explore the frequency domain, by presenting a common theoretical framework for autoencoding. This extends to equivariant object detection and anomaly localization formally connecting, group equivariance with Riesz representation theory. We establish its superior generalization abilities with tighter PAC- Bayesian bounds. Our novel LeaRN-EqSTN architecture performs sequential estimation of transformations via a learnable Riesz transform and spatial transformer networks with the aim of learning equivariance. This approach achieves greater computational efficiency without sacrificing theoretical guarantees. This architecture allows direct integration to autoencoding models while preserving equivariance properties. The thesis presents a number of main contributions. We present SPAGMACE, a novel unsupervised glimpse-based object detection system that organizes the latent space with Gaussian mixture priors to foster improved semantic interpretation. We extend LeaRN-EqSTN with SPAGMACE to develop the first glimpse-based equivariant unsupervised object detection model with enhanced performance on real-world datasets. To facilitate anomaly localization, we suggest an autoencoder framework that combines the strength of our LeaRN-EqSTN model with an efficient postprocessing able to discriminate anomalies from normal non-rigid distortions. We demonstrate the effectiveness our approach in all three setting: autoencoding, unsupervised object detection and anomaly localization. Our architectures show superior generalization compared to state-of-the-art models, particularly in low data regimes. This work demonstrates that incorporating geometric equivariance based on frequency level representations into neural architectures provides a principled approach to learning robust visual representations from limited data. Theoretical study unveils the inherent relationships among symmetry, aliasing, and generalization, while empirical results demonstrate improvements over stateof-the-art techniques in computer vision.

The present thesis has been achieved in partnership with IHRIM in the framework of the ANR project Rey Ornaments Image investigation (ROIi) supported by ANR.

Committee

Erik J. BEKKERS, Associate Professor, University of Amsterdam, Reviewer

Celine HUDELOT, Professor at Centrale Supélec, Reviewer

Amine NAIT ALI Professor at Paris-Est Créteil University, Examiner

Iuliia TKACHENKO Assistant Professor at Lyon 2 University, Examiner

Thierry FOURNEL Professeur at Jean Monnet University, Supervisor

Rémi EMONET Professeur at Jean Monnet University, Co-Supervisor