Personal tools
You are here: Home Dataset Head Pose Database Database Building

Database Building


Recording Set Up

Two common indoor environments were selected for our database: a meeting room (left image) and an office (right image). The two set ups were the following.

  • in the meeting case, four persons were engaged in a debate about statements displayed on a screen (at the left in the image example). However, due to sensor limitations (range or reliable recordings), the head pose of only two of them (visible in the camera view, left image) were being recorded. The scenario of the meetings was the following. First the two persons whose pose was recorded had to look straight at the camera to define their frontal view (see below). Then they had to write their name on a sheet of paper, and finally, for the remaining of the recording they had to discuss statements displayed on the projection screen with the other participants.
  • for the office recording, only the pose of the person nearest to the camera (see right image) was recorded. The scenario of he recording was the following. The person had to look straight at the camera to define his frontal head pose and perfom alignment gestures (sudden pan and tilt head rotations). Then he had to look at specific point of the office and to follow the instructions of the experimenter.

In the following, we describe the main elements to obtain the pose annotation. More details can be found in the following report.

Head Pose Definition and Annotation

In our data, the annotated head pose was defined relatively to the camera 3D basis and a reference head pose called frontal pose. First, a 3D reference coordinate system is rigidly attached to the head, with the following basis axes: the x axis is defined by the line between the two eyes, the y axis is defined by the vertical line going through the nose in a frontal view of the face, and the z axis is orthogonal to the x and y axis. Additionally, the head in an image is said to be in a frontal pose when it's head reference basis is aligned to the 3D camera reference basis at the head image position. Given these definitions, the annotated head pose of a viewed head is defined by the Euler angles parameterizing the rigid transformation allowing to pass from the virtual frontal head basis configuration to the real head configuration of the viewed head in the current image.

Several euler decompositions can be exploited. In the most common one, however, the rotation angle around the y axis is called the head pan, the rotation angle around the x axis is called the head tilt and the rotation angle around the z axis is called the head roll. The initial head pose labeling was done using a magnetic sensor called flock of bird (FOB) from Ascension Technology. The FOB is composed of two components: a reference base (usually fixed on the table), and the birds, that we rigidly tied to the head. The FOB device then outputs the Euler angles of the bird relatively to its reference basis. To obtain the head pose annotated as described above, two transformations involving calibration were necessary:

  • the first one consisted in transforming the bird pose, measured in FOB reference base, into a bird pose measured in the camera reference coordinate frame, which requested the knowledge of the rigid transform from the FOB reference frame to the camera reference frame.
  • the second transformation was necessary to align the bird coordinate frame axes with whose of the head reference frame. This was done by exploiting the bird measurements of a person looking straight at the camera, i.e being in frontal configuration.

Aligning FOB and Video Frames

Because the starting time of the FOB and the video recordings were different, we needed to first align the FOB and video data. This was achieved by identifying the timestamps of sudden changes of head poses in both modalities. By extracting several timestamps corresponding pairs, we were able to precisely estimate the time offset between the two recordings.