xEdgeFace: Efficient Cross-Spectral Face Recognition for Edge Devices

1Idiap Research Institute, 2UNIL

Accepted in IJCB 2025.
Paper arXiv Code Database
Realism transferred images

Model architecture of xEdgeFace models: The highlighted modules (LN-LayerNorm, ST-Conv. Stem, Stages-S0, S1, S2) are adapted while other network components remain frozen. The two loss components ensure modality alignment, keeping the source FR performance.

Summary

Heterogeneous Face Recognition (HFR) addresses the challenge of matching face images across different sensing modalities, such as thermal to visible or near-infrared to visible, expanding the applicability of face recognition systems in real-world, unconstrained environments. While recent HFR methods have shown promising results, many rely on computation-intensive architectures, limiting their practicality for deployment on resource-constrained edge devices. In this work, we present a lightweight yet effective HFR framework by adapting a hybrid CNN-Transformer architecture originally designed for face recognition. Our approach enables efficient end-to-end training with minimal paired heterogeneous data while preserving strong performance on standard RGB face recognition tasks. This makes it a compelling solution for both homogeneous and heterogeneous scenarios. Extensive experiments across multiple challenging HFR and face recognition benchmarks demonstrate that our method consistently outperforms state-of-the-art approaches while maintaining a low computational overhead.

Proposed Pipeline

xEdgeFace tackles the long-standing challenge of heterogeneous (cross-spectral) face recognition, where visible light mugshots must be matched to faces captured in other spectra such as thermal or NIR, an essential capability for surveillance and low-light authentication, but one that is challenging for conventional RGB-trained models and is hard to deploy on resource-constrained devices.

The core insight in this paper is that most modality discrepancies can be mitigated by adapting only a minimal set of low-level convolutional layers and the LayerNorm statistics of a lightweight CNN-Transformer backbone (EdgeFace), while a contrastive self-distillation loss preserves the original RGB discriminative power.

This yields a single compact network (as small as 0.09 GFLOPs / 1.24M parameters) that:

  • needs very little paired cross-modal data,
  • avoids catastrophic forgetting, and
  • outperforms or matches heavier state-of-the-art systems on multiple challenging HFR benchmarks (e.g., +361% Rank-1 gain for the XXS variant on Tufts VIS-Thermal and 99.86% Rank-1 on SCFace), while maintaining near-parity on standard RGB benchmarks.

We show that selective LayerNorm tuning plus self-distillation suffices to extend an off-the-shelf tiny FR network to robust, cross-spectral recognition, enabling real-time edge deployment without additional synthesis pipelines or modality-specific branches.

We evaluate the computational efficiency of our approach by reporting two key metrics: the number of floating-point operations (GFLOPs) and the total number of parameters (in millions, denoted as MPARAMs). As shown in the figure below, the proposed xEdgeFace variants operate with significantly reduced computational overhead and parameter count, highlighting their suitability for deployment in resource-constrained environments.

Model size vs compute

Comparison of the size (in Million Params) and compute (in Giga FLOPS) of state-of-the-art HFR models against the xEdgeFace variants.


Face Recognition Performance of xEdgeFace

xEdgeFace improves performance across new modalities without degrading accuracy on the original RGB benchmarks, all within a single unified network. The self-distillation regularization prevents catastrophic forgetting, resulting in a compact model that excels in both homogeneous and cross-spectral face recognition.

xEdgeFace performance chart

xEdgeFace achieves high accuracy across RGB and cross-spectral benchmarks using a single compact model.


Performance in HFR

This figure shows the performance improvement of xEdgeFace on the VIS-Thermal protocol in the MCXFace dataset. The proposed xEdgeFace approach outperforms all compared methods, achieving the highest average Rank-1 accuracy of 91.68%.

VIS-Thermal performance in MCXFace

Performance in MCXFace dataset (VIS-Thermal protocol).


Dataset Availability

The MCXFace dataset used in this paper is publicly available at: https://www.idiap.ch/en/scientific-research/data/mcxface .

BibTeX

@article{xedgeface,
  title     = {xEdgeFace: Efficient Cross-Spectral Face Recognition for Edge Devices},
  author    = {George, Anjith and Marcel, Sebastien},
  booktitle = {2025 IEEE International Joint Conference on Biometrics (IJCB)},
  pages     = {1--10},
  year      = {2025},
  organization = {IEEE}
}