EdgeDoc Video
Our proposed architecture, EdgeDoc, is based on the XXS variant of the EdgeNeXt backbone. It extracts multi-scale feature maps from various stages of the network, which are then fed into a custom decoder structured in a U-Net style. The architecture of EdgeDoc is shown in Figure. The decoder is composed of upsampling blocks, each consisting of depthwise separable 2D convolutions, followed by 2D Layer Normalization and ReLU activations. For classification, we utilize a bottleneck head comprising global average pooling and fully connected layers. The final segmentation mask is generated via a pointwise (1×1) convolution applied to the decoder output.
FantasyID Dataset consists of multiple attacks.
EdgeDoc achives high performance in the validation set of FantasyID dataset. Fusion with TruFor improves the generalization to other datasets.
We trained the proposed EdgeDoc model using the training set of the Fantasy ID dataset and evaluated its performance on the corresponding validation set. In addition, we assessed several off-the-shelf baseline methods, including TruFor, for comparative analysis. The results of these evaluations are summarized in the table below, where EdgeDoc demonstrates superior performance compared to all other methods. Furthermore, we explored a fusion of EdgeDoc and TruFor using a weighted combination, which also yielded competitive results.
@article{george2025edgedoc,
title={EdgeDoc: Hybrid CNN-Transformer Model for Accurate Forgery Detection and Localization in ID Documents},
author={George, Anjith and Marcel, Sebastien},
journal={Idiap Research Report},
year={2025}
}