Summary

Geometric Generative Gaze Estimation model from RGB-D sensors

As a display of attention and interest, gaze is a fundamental cue in understanding people activities, behaviors, and state of mind, and plays an important role in many research fields like psychology, Human Robotics Interaction (HRI) or Human Computer Interfaces (HCI). For these reasons, many computer vision based gaze estimation methods have been proposed, but a solution based on consumer hardware is still needed. To minimize intrusion and accommodate user's movement, remote cameras with wide enough field of view are preferred but lead to the challenge of low resolution imaging. From a methodological viewpoint, two main approaches exist. Geometric ones based on explicit eye geometry models can be very accurate but rely on high resolution images to fit and track the local features (glints, pupil center...) used to estimate the geometric parameters. On the other side, appearance based methods, which learn a direct mapping between the eye image and gaze parameters avoid feature tracking, but often need test data close to the training set in terms of user, gaze space, and illumination conditions.

Leveraging on our previous work, we propose the current G3E project to investigate a novel head-pose independent gaze estimation method that takes advantage of the appearance and geometric methods. It relies on an appearance based probabilistic generative process that model the generation of head-pose independent eye images recovered thanks to the use of consumer RGB-D cameras. By using an explicit geometric gaze model, we will handle head pose and gaze direction in a unified framework, allowing 3D space reasoning and extrapolating to gaze directions not seen in the training data. On the other hand, by modeling the generation of semantic regions (eyelids, cornea, sclera), we will decouple the gazing process and user geometry from the ambient conditions (color appearance), while avoiding the critical local feature (cornea/iris) fitting and tracking problems of standard geometric methods.

The G3E project will study different modeling options to address the problem including several inference schemes to achieve the difficult learning from low resolution images. In a second thread, we will investigate the bayesian properties of the model to address the unsupervised learning and adaptation to several factors such as session lighting, by leveraging on relevant priors (e.g. eye color palettes). The project will thus address a fundamental component towards human-human or human-computer/robot communication perception whose improvements can be further exploited in those domains for better interaction modeling and understanding.

This project is supported by the Swiss National Science Foundation www.snf.ch, Project: FNS-200020_153085. It is a follow-up of the SNSF Tracome project