AI & Gaze: Enabling Enhanced Societal Applications
Eye tracking is a key tool across numerous fields, ranging from the understanding of attention to the design of supporting medical and assistive technologies. However, most existing AI-based eye-tracking systems typically require users to look in the direction of the camera and perform best under ideal lighting or controlled laboratory conditions. Researchers Jean-Marc Odobez and Pierre Vuillecard have developed a method called ST-WSGE (Self-Training Weakly-Supervised Gaze Estimation) to address these limitations. This approach is designed to work well in real-world situations, even when people turn their heads, part of their faces are hidden, or the background changes. This is made possible by leveraging diverse training data and advanced self-learning techniques.
What sets ST-WSGE apart is its ability to learn from both 3D gaze data and simpler 2D annotations (such as labels indicating where someone is looking in a picture), a significant improvement given the complexity in collecting 3D data compared to 2D. This process involves two main steps: first, the model is trained on available 3D gaze datasets; then, it is refined using pseudo 3D labels generated by combining 2D annotations with the model’s own predictions based on its learned knowledge. This two-step learning process significantly expands the training data, resulting in a model that performs well in new and real-world conditions.
Another key innovation developed by the researchers is the Gaze Transformer, which is based on the same transformer technology used in recent language and vision models. Its design handles both images and videos, enabling it to be trained on a larger set of data. This new method has outperformed previous ones in key tests, demonstrating reliable performance across a variety of conditions. Remarkably, it improved accuracy even when evaluated on new datasets it had not encountered before, a common challenge for most models. These strengths make it exceptionally well-suited for practical dynamic gaze-aware applications such as driver monitoring, human-robot collaboration, boosting accessibility, and enhancing virtual reality experiences.
In support of open science, the researchers have publicly shared their code and models. Their goal is to foster collaborative progress in gaze-based AI and facilitate the creation of technologies that use eye tracking to gain deeper insights into human intent.
The study was presented by PhD student Pierre Vuillecard at the IEEE Computer Vision and Pattern Recognition Conference (CVPR) 2025 in Nashville.
Around the same time, Jean-Marc Odobez, senior research scientist at Idiap and Head of the Perception & Activity Understanding Group, gave a keynote speech at the Eye Tracking Research and Applications (ETRA) 2025 conference. He discussed how to interpret where people are looking and what captures their attention using gaze analysis. His talk showcased his lab's progress in 3D gaze estimation, which now uses personalized models and considers social context. Odobez also presented new ways to identify gaze targets and social cues like eye contact and shared attention, all of which help us better understand how people pay attention and interact in real-world situations.
Paper:
Vuillecard, P., & Odobez, J.-M. (2025). Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) (pp. 13508-13518).
Code: https://github.com/idiap/gaze3d
ETRA Keynote: https://etra.acm.org/2025/keynotes.html