FaceRecBench: Benchmarking Multimodal Large Language Models for Face Recognition

Idiap Research Institute
Multimodal large language models (MLLMs) have achieved remarkable performance across diverse vision-and-language tasks. However, their potential in face recognition remains underexplored. In particular, the performance of open-source MLLMs needs to be evaluated and compared with existing face recognition models on standard benchmarks with similar protocol. In this work, we present a systematic benchmark of state-of-the-art MLLMs for face recognition on several face recognition datasets, including LFW, CALFW, CPLFW, CFP, AgeDB and RFW. Experimental results reveal that while MLLMs capture rich semantic cues useful for face-related tasks, they lag behind specialized models in high-precision recognition scenarios in zero-shot applications. This benchmark provides a foundation for advancing MLLM-based face recognition, offering insights for the design of next-generation models with higher accuracy and generalization.

FaceRecBench Protocol

In FaceRecBench, we evaluate MLLMs on popular face recognition datasets using a similar protocol. Therefore, similar to a face recognition model in verification scenario, the MLLM task is to verify if two images belong to the same person or not based on their face images. To this end, we design a prompt template to feed the MLLMs with two face images and ask them to answer if they belong to the same person or not:

FaceRecBench Protocol

Evaluation

The following table shows benchmark results of several MLLMs on face recognition datasets are reported in our paper:

FaceRecBench Results

We also comapre the performance of different MLLMs on Racial Faces in-the-Wild (RFW) dataset in the following table:

FaceRecBench Results on RFW

FaceRecBench Source Code

[Source Code] The source code of our FaceRecBench is publicly available: https://github.com/idiap/facerecbench

BibTeX


  @article{facerecbench2025,
    author    = {Hatef Otroshi Shahreza and S{\'e}bastien Marcel},
    title     = {Benchmarking Multimodal Large Language Models for Face Recognition},
    journal   = {arXiv preprint arXiv:2510.14866},
    year      = {2025}
  }