Visioner

The Visioner is a library that implements face detection, key point localization and pose estimation in still images using Boosted Classifiers. For the time being, We only provide a limited set of interfaces allowing detection and localization. You can incorporate a call to the Visioner detection system in 3-ways on your script:

  1. Use simple (single) face detection with bob.visioner.MaxDetector:

    In this mode, the Visioner will only detect the most likely face object in a given image. It returns a tuple containing the detection bounding box (top-left x, top-left y, width, height, score). Here is an usage example:

    detect_max = bob.visioner.MaxDetector()
    image = bob.io.load(...)
    bbox = detect_max(image)
    

    With this technique you can control:

    • the number of scanning levels;
    • the scale variation in pixels.

    Look at the user manual using help() for operational details.

  2. Use simple face detection with bob.visioner.Detector:

    In this mode, the Visioner will return all bounding boxes above a given threshold in the image. It returns a tuple of tuples (descending threshold ordered) containing the detection bounding boxes (top-left x, top-left y, width, height, score). Here is an usage example:

    detect = bob.visioner.Detector()
    image = bob.io.load(...)
    bboxes = detect(image) #note this is a tuple of tuples
    

    With this technique you can control:

    • the minimum detection threshold;
    • the number of scanning levels;
    • the scale variation in pixels;
    • the NMS clustering overlapping threshold.

    Look at the user manual using help() for operational details.

  3. Use key-point localization with bob.visioner.Localizer:

    In this mode, the Visioner will return a single bounding box and the x and y coordinates of every detected land mark in the image. The number of landmarks following the bounding box is determined by the loaded model. In Bob, we ship with two basic models:

    • bob.visioner.DEFAULT_LMODEL_EC: this is the default model used for keypoint localization if you don’t provide anything to the bob.visioner.Localizer constructor. A call to the function operator (__call__()) will return the bounding box followed by the coordinates of the left and right eyes respectively. The format is (top-left b.box x, top-left b.box y, b.box width, b.box height, left-eye x, left-eye y, right-eye x, right-eye y).
    • bob.visioner.DEFAULT_LMODEL_MP: this is an alternative model that can be used for keypoint localization. A call to the function operator with a Localizer equipped with this model will return the bounding box followed by the coordinates of the eye centers, eye corners, nose tip, nostrils and mouth corners (always left and then right coordinates, with the x value coming first followed by the y value of the keypoint).

    Note

    No scores are returned in this mode.

    Example usage:

    locate = bob.visioner.Localizer()
    image = bob.io.load(...)
    bbx_points = locate(image) #note (x, y, width, height, x1, y1, x2, y2...)
    

    With this technique you can control:

    • the number of scanning levels;
    • the scale variation in pixels;

    Look at the user manual using help() for operational details.

Applications

We provide 2 applications that are shipped with Bob:

  • visioner_facebox.py: This application takes as input either a video or image file and can output bounding boxes for faces detected on those files. It uses bob.visioner.MaxDetector for this purpose. You can configure, via command-line parameters, the number of scanning levels or the use of a user-provided classification model for face localization;
  • visioner_fecepoints.py: Is similar to the facebox script, but detects both the face and keypoints on the given video or image. You can configure the number of scanning levels, or provide external classification and localization models. By default, this program will use the default localization model provide by Bob which can detect eye-centers;

The face detection and keypoint localization programs can, optionally, create an output video or image with the face bounding box and localized keypoints drawn, for debugging purposes. Look at their help message for more instructions and examples.

Reference Manual

bob.visioner.DEFAULT_DETECTION_MODEL = '/idiap/group/torch5spro/scratch/buildbot-slave/ekhoury-x86_64/idiap-12.10-x86_64-release/build/build/lib/python2.7/site-packages/bob/visioner/detection.gz'

Default classification model for basic face detection

bob.visioner.DEFAULT_LOCALIZATION_MODEL = '/idiap/group/torch5spro/scratch/buildbot-slave/ekhoury-x86_64/idiap-12.10-x86_64-release/build/build/lib/python2.7/site-packages/bob/visioner/localization.gz'

Default keypoint localization model. TODO: How many points?

class bob.visioner.MaxDetector(model_file=None, threshold=0.0, scanning_levels=0, scale_variation=2, clustering=0.05, method=bob.visioner._visioner.DetectionMethod.Scanning)[source]

Bases: bob.visioner._visioner.CVDetector

A class that bridges the Visioner to bob so as to detect the most face-like object in still images or video frames

Creates a new face localization object by loading object classification and keypoint localization models from visioner model files.

Keyword Parameters:

model
file containing the model to be loaded; note: Serialization will use a native text format by default. Files that have their names suffixed with ‘.gz’ will be automatically decompressed. If the filename ends in ‘.vbin’ or ‘.vbgz’ the format used will be the native binary format.
threshold
object classification threshold
scanning_levels
scanning levels (the more, the faster)
scale_variation
scale variation in pixels
clustering
overlapping threshold for clustering detections
method
Scanning (default) or GroundTruth (note: this option does not work for the time being)
__call__(image)[source]

Runs the detection machinery, returns a single bounding box

Keyword parameters:

image
A gray-scaled image (2D array) with dtype=uint8.

Returns a single (highest scored) detection as a bounding box.

clustering

Overlapping threshold for clustering detections

detect((CVDetector)self, (object)image) → object :

Detects faces in the input (gray-scaled) image according to the current settings. The input image format should be a 2D array of dtype=uint8.

C++ signature :
boost::python::api::object detect(bob::visioner::CVDetector {lvalue},bob::python::const_ndarray)
detect_max((CVDetector)self, (object)image) → object :

Detects the most probable face in the input (gray-scaled) image according to the current settings

C++ signature :
boost::python::api::object detect_max(bob::visioner::CVDetector {lvalue},bob::python::const_ndarray)
method

Scanning or GroundTruth (default)

save((CVDetector)self, (str)filename) → None :

Saves the model and parameters to a given file.

Note: Serialization will use a native text format by default. Files that have their name suffixed with ‘.gz’ will be automatically decompressed. If the filename ends in ‘.vbin’ or ‘.vbgz’ the format used will be the native binary format.

C++ signature :
void save(bob::visioner::CVDetector {lvalue},std::string)
scale_variation

Scale variation in pixels

scanning_levels

Levels (the more, the faster)

threshold

Object classification threshold

class bob.visioner.Detector(model_file=None, threshold=0.0, scanning_levels=0, scale_variation=2, clustering=0.05, method=bob.visioner._visioner.DetectionMethod.Scanning)[source]

Bases: bob.visioner._visioner.CVDetector

A class that bridges the Visioner to bob so as to detect faces in still images or video frames

Creates a new face localization object by loading object classification and keypoint localization models from visioner model files.

Keyword Parameters:

model
file containing the model to be loaded; note: Serialization will use a native text format by default. Files that have their names suffixed with ‘.gz’ will be automatically decompressed. If the filename ends in ‘.vbin’ or ‘.vbgz’ the format used will be the native binary format.
threshold
object classification threshold
scanning_levels
scanning levels (the more, the faster)
scale_variation
scale variation in pixels
clustering
overlapping threshold for clustering detections
method
Scanning (default) or GroundTruth (note: this option does not work for the time being)
__call__(image)[source]

Runs the detection machinery, returns all bounding boxes above threshold. Detections are already clustered following the clustering parameter. The iterable contains detections in descending order with the first being the one with the highest score.

Keyword parameters:

image
A gray-scaled image (2D array) with dtype=uint8.

Returns an iterable with all detected bounding boxes in descending score order (first one is has the highest score).

clustering

Overlapping threshold for clustering detections

detect((CVDetector)self, (object)image) → object :

Detects faces in the input (gray-scaled) image according to the current settings. The input image format should be a 2D array of dtype=uint8.

C++ signature :
boost::python::api::object detect(bob::visioner::CVDetector {lvalue},bob::python::const_ndarray)
detect_max((CVDetector)self, (object)image) → object :

Detects the most probable face in the input (gray-scaled) image according to the current settings

C++ signature :
boost::python::api::object detect_max(bob::visioner::CVDetector {lvalue},bob::python::const_ndarray)
method

Scanning or GroundTruth (default)

save((CVDetector)self, (str)filename) → None :

Saves the model and parameters to a given file.

Note: Serialization will use a native text format by default. Files that have their name suffixed with ‘.gz’ will be automatically decompressed. If the filename ends in ‘.vbin’ or ‘.vbgz’ the format used will be the native binary format.

C++ signature :
void save(bob::visioner::CVDetector {lvalue},std::string)
scale_variation

Scale variation in pixels

scanning_levels

Levels (the more, the faster)

threshold

Object classification threshold

class bob.visioner.Localizer(model_file=None, method=bob.visioner._visioner.LocalizationMethod.MultipleShots_Median, detector=None)[source]

Bases: bob.visioner._visioner.CVLocalizer

A class that bridges the Visioner to bob to localize keypoints in still images or video frames

Creates a new face localization object by loading object classification and keypoint localization models from visioner model files.

Keyword Parameters:

model_file
Path to a file containing the keypoint localization model. If None is given, use the default localizer.
method
SingleShot, MultipleShots_Average or MultipleShots_Median (default)
detector
Path to a file or a CVDetector (or Max/Detector) object to be used as the basis for the localization procedure. If None is given (the default), use the default detector.
__call__(image)[source]

Runs the localization machinery, returns the bounding box and points

Keyword parameters:

image
A gray-scaled image (2D array) with dtype=uint8.

Returns a bounding box and a set of keypoints.

locate((CVLocalizer)self, (CVDetector)detector, (object)image) → object :

Runs the keypoint localization on the first (highest scored) face location determined by the detector. The input image format should be a 2D array of dtype=uint8.

C++ signature :
boost::python::api::object locate(bob::visioner::CVLocalizer {lvalue},bob::visioner::CVDetector {lvalue},bob::python::const_ndarray)
method

SingleShot, MultipleShots_Average or MultipleShots_Median (default)

save((CVLocalizer)self, (str)filename) → None :

Saves the model and parameters to a given file.

Note: Serialization will use a native text format by default. Files that have their name suffixed with ‘.gz’ will be automatically decompressed. If the filename ends in ‘.vbin’ or ‘.vbgz’ the format used will be the native binary format.

C++ signature :
void save(bob::visioner::CVLocalizer {lvalue},std::string)

Table Of Contents

Previous topic

Metrics

Next topic

History

This Page