AugGen: Synthetic Augmentation using Diffusion Models Can Improve Recognition
Jan 1, 2025·,,·
0 min read
Parsa Rahimi
Damien Teney
Sébastien Marcel

Abstract
The increasing reliance on large-scale datasets in machine learning poses significant privacy and ethical challenges, particularly in sensitive domains such as face recognition. Synthetic data generation offers a promising alternative; however, most existing methods depend heavily on external datasets or pre-trained models, increasing complexity and resource demands. In this paper, we introduce AugGen, a self-contained synthetic augmentation technique. AugGen strategically samples from a class-conditional generative model trained exclusively on the target FR dataset, eliminating the need for external resources. Evaluated across 8 FR bench- marks, including IJB-C and IJB-B, our method achieves 1-12% performance improvements, outperforming models trained solely on real data and surpass- ing state-of-the-art synthetic data generation approaches, while using less real data. Notably, these gains often exceed those from architectural enhancements, underscoring the value of synthetic augmentation in data-limited scenarios. Our findings demonstrate that carefully integrated synthetic data can both mitigate pri- vacy constraints and substantially enhance recognition performance.
Type
Publication
Conference on Neural Information Processing Systems