NeurIPS 2025: Highlights from Our Research
Sébastien Marcel and colleagues presented a method to use generative image models to augment datasets used to train discriminative models. This approach is particularly relevant when data is scarce or subject to privacy issues, e.g. with biometrics and face recognition. The authors propose a new self-contained data-augmentation pipeline with diffusion models, which they use to train better face‑recognition models without relying on external datasets or models. This improves performance by 1–12% over baselines and prior methods for synthetic augmentation. This approach offers a privacy-conscious route for boosting recognition performance when data is limited [1].
Ina Kodrasi, Petr Motlicek and their teams tackle the challenge of fine-tuning large pre-trained models with reduced computational resources. They present FVAE-LoRA, a variant of low-rank adaptation (LoRA) that uses a variational autoencoder to split low-rank updates into task-relevant and residual subspaces. This improves performance on text, image, and audio tasks, and increases robustness under distribution shifts, suggesting FVAE LoRA better isolates the true signal for the task. This offers a more efficient approach to the training and adaptation of large pretrained models [2].
Damien Teney and collaborators propose a new method for out-of-distribution (OOD) detection that challenges common practices based on last-layer embeddings of pretrained models. They show that features from various intermediate layers contain a rich signal relevant to the detection of distribution shifts. Their new method improves by up to 10 percentage points on standard benchmarks, demonstrating a significant benefit for the robust deployment of machine learning models in real-world applications [3].
In a workshop paper, Mathew Magimai-Doss and former PhD graduate Eklavya Sarkar demonstrate that animal vocalizations contain temporal patterns often overlooked by traditional embeddings. They convert HuBERT audio embeddings into discrete token sequences, which preserves sequence information and enables better differentiation of call types and individual animals. This approach opens new possibilities for structured bioacoustics research [4].
Together, these publications showcase the breadth of expertise in our research groups and the diversity of the research conducted at Idiap to drive innovation in AI.
--
References:
[1] Rahimi, P., Teney, D., & Marcel, S. AugGen: Synthetic augmentation using diffusion models can improve recognition.
[2] Kumar, S., Kaloga, Y., Mitros, J., Motlicek, P., & Kodrasi, I. Latent space factorization in LoRA.
[3] Imezadelajara, C., Rodriguez-Opazo, C., Teney, D., Ranasinghe, D., & Abbasnejad, E. Mysteries of the deep: Role of intermediate representations in out of distribution detection.
[4] Sarkar, E., & Magimai-Doss, M. Towards leveraging sequential structure in animal vocalizations.