AugGen: Synthetic Augmentation using Diffusion Models Can Improve Recognition

Jan 1, 2025·

Parsa Rahimi

Damien Teney

Sébastien Marcel

· 0 min read

PDF Source Document

Abstract

The increasing reliance on large-scale datasets in machine learning poses significant privacy and ethical challenges, particularly in sensitive domains such as face recognition. Synthetic data generation offers a promising alternative; however, most existing methods depend heavily on external datasets or pre-trained models, increasing complexity and resource demands. In this paper, we introduce AugGen, a self-contained synthetic augmentation technique. AugGen strategically samples from a class-conditional generative model trained exclusively on the target FR dataset, eliminating the need for external resources. Evaluated across 8 FR bench- marks, including IJB-C and IJB-B, our method achieves 1-12% performance improvements, outperforming models trained solely on real data and surpass- ing state-of-the-art synthetic data generation approaches, while using less real data. Notably, these gains often exceed those from architectural enhancements, underscoring the value of synthetic augmentation in data-limited scenarios. Our findings demonstrate that carefully integrated synthetic data can both mitigate pri- vacy constraints and substantially enhance recognition performance.

Type

Conference paper

Publication

Conference on Neural Information Processing Systems

Last updated on Jan 1, 2025

No results found

AugGen: Synthetic Augmentation using Diffusion Models Can Improve Recognition