VSGaze

annotations across multiple video based datasets for gaze following and social gaze

Description

VSGaze unifies and extends annotations across multiple video based datasets for gaze following and social gaze. This results in the largest and most diverse dataset of its kind, with annotations for gaze following, looking at heads, looking at each other and shared attention. We additionally extend GazeFollow with annotations for looking at heads.

Downloading Images and Videos

To obtain the raw images and videos associated with each dataset, please follow the original dataset links:

For VideoCoAtt, we recommend re-extracting the frames from the videos yourself, as the provided frames are highly compressed.

Annotation File Format

Each annotation entry includes the following fields:

path: Path to the image or video frame (append this to the dataset's root directory).
head_bboxes: List of normalized bounding box coordinates for all heads in the scene. The first entry is always a zero box (placeholder).
person_ids: List of person IDs corresponding to the `head_bboxes`.
gaze_points: List of normalized gaze coordinates in `[0, 1]`, corresponding to each person.
inout: List of labels indicating whether the gaze target for each person is within the frame:
- 1: Inside the frame
- 0: Outside the frame
- -1: Unknown
pairs: List of person ID pairs for social gaze analysis.
lah_pairs: List of labels corresponding to the person pairs for "looking at head" (LAH) interactions:
- 1: Second person is looking at the first
- 0: Not looking
- -1: Unknown
laeo_pairs: List of labels corresponding to the person pairs for "looking at each other" (LAEO) interactions:
- 1: Yes
- 0: No
- -1: Unknown
coatt_pairs: List of labels corresponding to the person pairs for "shared attention" interactions:
- 1: Yes
- 0: No
- -1: Unknown

Important: Please ignore annotations corresponding to the zero box in `head_bboxes`.

For GazeFollow, only applicable fields will be populated. For the test set, we include multiple annotations for the last person id under the gaze_points_p1 field. Padding values are given by [-1, -1].

Reference

If you use this dataset, please cite our papers:

@article{gupta2024mtgs,
title={Mtgs: A novel framework for multi-person temporal gaze following and social gaze prediction},
author={Gupta, Anshul and Tafasca, Samy and Farkhondeh, Arya and Vuillecard, Pierre and Odobez, Jean-Marc},
journal={Advances in Neural Information Processing Systems}
volume={37},
pages={15646--15673},
year={2024}
}

@inproceedings{gupta2024unified,
title={A unified model for gaze following and social gaze prediction},
author={Gupta, Anshul and Tafasca, Samy and Chutisilp, Naravich and Odobez, Jean-Marc},
booktitle={2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)}
pages={1--9},
year={2024},
organization={IEEE}
}

Please also cite the constituent datasets:

@article{recasens2015they,
title={Where are they looking?},
author={Recasens, Adria and Khosla, Aditya and Vondrick, Carl and Torralba, Antonio},
journal={Advances in neural information processing systems}
volume={28},
year={2015}
}

@inproceedings{chong2020dvisualtargetattention,
title={Detecting Attended Visual Targets in Video},
author={Chong, Eunji and Wang, Yongxin and Ruiz, Nataniel and Rehg, James M},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}
pages={5396--5406},
year={2020}
}

@inproceedings{tafasca2023childplay,
title={ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour},
author={Tafasca, Samy and Gupta, Anshul and Odobez, Jean-Marc},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}
pages={20935--20946},
year={2023},
}

@inproceedings{fan2018inferring_videocoatt,
title={Inferring shared attention in social scene videos},
author={Fan, Lifeng and Chen, Yixin and Wei, Ping and Wang, Wenguan and Zhu, Song-Chun},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition}
pages={6460--6468},
year={2018},
}

@inproceedings{Marin-Jimenez_2019_CVPR,
title={Marin-Jimenez, Manuel J. and Kalogeiton, Vicky and Medina-Suarez, Pablo and Zisserman, Andrew},
author={LAEO-Net: Revisiting People Looking at Each Other in Videos},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}
month={June},
year={2019},
}