Machine Learning

The Machine Learning team drives research activities organized around two complementary and tightly interlinked projects:
- Responsible and Trustworthy Machine Learning (RTML) leads research focused on ensuring the ethical and reliable deployment of modern AI. This includes critical work on fairness, bias detection and mitigation, model explainability, and robustness, with particular attention to complex data types like text, sequences, and graphs.
- Fundamentals and Theory of Machine Learning (FTML) drives research into the core principles underpinning modern AI. This involves deep investigations into statistical learning theory, optimization algorithms, and unsupervised learning techniques. Our specific interests lie in areas such as PAC-Bayesian theory, generative modeling, and the theoretical understanding of learning representations and operators.
The Machine Learning team has ongoing work and collaborations that are focused on the following topics:
- Deriving tight and computable theoretical guarantees (rooted in statistical machine learning) for, e.g., generalization, robustness, and fairness, and turning them into self-bounding learning algorithms with built-in certification.
- Designing criteria for data-quality evaluation and methods for handling imbalanced data with missing and unobserved values.
- Evaluating, certifying and improving fairness, ethics and moral alignment of large language models.
- Improving the understanding of generative models, their generalization capabilities and creativity, and their links with optimal transport and transfer learning.
- Designing optimization approaches and representations for operator learning, especially useful for modeling partial differential equations.
- Designing multimodal machine learning approaches that combine representations such as graphs, text, image and vector data.
- Advancing machine learning for science in collaboration with the MALICE Inria Team with knowledge discovery from observational data, explainable/interpretable models, efficient simulator surrogate building, and transfer learning, with applications to surface engineering, chemistry, carbon capture and more.
- Advancing machine learning techniques such as federated learning and graph-based models to enable adaptive, decentralized artificial intelligence pipelines, especially in the context of telecommunication networks.
- Improving the frugality and energy efficiency of machine learning approaches, by leveraging theoretical understanding, both in terms of computation power (large models) and in terms of data requirements (active and reinforcement learning) in challenging regimes (very imbalanced data, incomplete and weak labeling).
