Visioner
The Visioner is a library that implements face detection, key point
localization and pose estimation in still images using Boosted Classifiers. For the time being, We only provide
a limited set of interfaces allowing detection and localization. You can
incorporate a call to the Visioner detection system in 3-ways on your script:
Use simple (single) face detection with
bob.visioner.MaxDetector:
In this mode, the Visioner will only detect the most likely face object in
a given image. It returns a tuple containing the detection bounding box
(top-left x, top-left y, width, height, score). Here is an usage example:
detect_max = bob.visioner.MaxDetector()
image = bob.io.load(...)
bbox = detect_max(image)
With this technique you can control:
- the number of scanning levels;
- the scale variation in pixels.
Look at the user manual using help() for operational details.
Use simple face detection with bob.visioner.Detector:
In this mode, the Visioner will return all bounding boxes above a given
threshold in the image. It returns a tuple of tuples (descending threshold
ordered) containing the detection bounding boxes (top-left x, top-left y,
width, height, score). Here is an usage example:
detect = bob.visioner.Detector()
image = bob.io.load(...)
bboxes = detect(image) #note this is a tuple of tuples
With this technique you can control:
- the minimum detection threshold;
- the number of scanning levels;
- the scale variation in pixels;
- the NMS clustering overlapping threshold.
Look at the user manual using help() for operational details.
Use key-point localization with bob.visioner.Localizer:
In this mode, the Visioner will return a single bounding box and the x and y
coordinates of every detected land mark in the image. The number of
landmarks following the bounding box is determined by the loaded model. In
Bob, we ship with two basic models:
- bob.visioner.DEFAULT_LMODEL_EC: this is the default model
used for keypoint localization if you don’t provide anything to the
bob.visioner.Localizer constructor. A call to the function
operator (__call__()) will return the bounding box followed by
the coordinates of the left and right eyes respectively. The format is
(top-left b.box x, top-left b.box y, b.box width, b.box height, left-eye
x, left-eye y, right-eye x, right-eye y).
- bob.visioner.DEFAULT_LMODEL_MP: this is an alternative model
that can be used for keypoint localization. A call to the function
operator with a Localizer equipped with this model will return the
bounding box followed by the coordinates of the eye centers, eye corners,
nose tip, nostrils and mouth corners (always left and then right
coordinates, with the x value coming first followed by the y value of the
keypoint).
Note
No scores are returned in this mode.
Example usage:
locate = bob.visioner.Localizer()
image = bob.io.load(...)
bbx_points = locate(image) #note (x, y, width, height, x1, y1, x2, y2...)
With this technique you can control:
- the number of scanning levels;
- the scale variation in pixels;
Look at the user manual using help() for operational details.
Applications
We provide 2 applications that are shipped with Bob:
- visioner_facebox.py: This application takes as input either a video or image
file and can output bounding boxes for faces detected on those files. It uses
bob.visioner.MaxDetector for this purpose. You can configure,
via command-line parameters, the number of scanning levels or the use of a
user-provided classification model for face localization;
- visioner_fecepoints.py: Is similar to the facebox script, but detects both
the face and keypoints on the given video or image. You can configure the
number of scanning levels, or provide external classification and
localization models. By default, this program will use the default
localization model provide by Bob which can detect eye-centers;
The face detection and keypoint localization programs can, optionally, create
an output video or image with the face bounding box and localized keypoints
drawn, for debugging purposes. Look at their help message for more instructions
and examples.
Reference Manual
-
bob.visioner.DEFAULT_DETECTION_MODEL = '/idiap/group/torch5spro/scratch/buildbot-slave/ekhoury-x86_64/idiap-12.10-x86_64-release/build/build/lib/python2.7/site-packages/bob/visioner/detection.gz'
Default classification model for basic face detection
-
bob.visioner.DEFAULT_LOCALIZATION_MODEL = '/idiap/group/torch5spro/scratch/buildbot-slave/ekhoury-x86_64/idiap-12.10-x86_64-release/build/build/lib/python2.7/site-packages/bob/visioner/localization.gz'
Default keypoint localization model. TODO: How many points?
-
class bob.visioner.MaxDetector(model_file=None, threshold=0.0, scanning_levels=0, scale_variation=2, clustering=0.05, method=bob.visioner._visioner.DetectionMethod.Scanning)[source]
Bases: bob.visioner._visioner.CVDetector
A class that bridges the Visioner to bob so as to detect the most
face-like object in still images or video frames
Creates a new face localization object by loading object classification
and keypoint localization models from visioner model files.
Keyword Parameters:
- model
- file containing the model to be loaded; note: Serialization will use a native text format by default. Files that have their names suffixed with ‘.gz’ will be automatically decompressed. If the filename ends in ‘.vbin’ or ‘.vbgz’ the format used will be the native binary format.
- threshold
- object classification threshold
- scanning_levels
- scanning levels (the more, the faster)
- scale_variation
- scale variation in pixels
- clustering
- overlapping threshold for clustering detections
- method
- Scanning (default) or GroundTruth (note: this option does not work for
the time being)
-
__call__(image)[source]
Runs the detection machinery, returns a single bounding box
Keyword parameters:
- image
- A gray-scaled image (2D array) with dtype=uint8.
Returns a single (highest scored) detection as a bounding box.
-
clustering
Overlapping threshold for clustering detections
-
detect((CVDetector)self, (object)image) → object :
Detects faces in the input (gray-scaled) image according to the current settings. The input image format should be a 2D array of dtype=uint8.
- C++ signature :
- boost::python::api::object detect(bob::visioner::CVDetector {lvalue},bob::python::const_ndarray)
-
detect_max((CVDetector)self, (object)image) → object :
Detects the most probable face in the input (gray-scaled) image according to the current settings
- C++ signature :
- boost::python::api::object detect_max(bob::visioner::CVDetector {lvalue},bob::python::const_ndarray)
-
method
Scanning or GroundTruth (default)
-
save((CVDetector)self, (str)filename) → None :
Saves the model and parameters to a given file.
Note: Serialization will use a native text format by default. Files that have their name suffixed with ‘.gz’ will be automatically decompressed. If the filename ends in ‘.vbin’ or ‘.vbgz’ the format used will be the native binary format.
- C++ signature :
- void save(bob::visioner::CVDetector {lvalue},std::string)
-
scale_variation
Scale variation in pixels
-
scanning_levels
Levels (the more, the faster)
-
threshold
Object classification threshold
-
class bob.visioner.Detector(model_file=None, threshold=0.0, scanning_levels=0, scale_variation=2, clustering=0.05, method=bob.visioner._visioner.DetectionMethod.Scanning)[source]
Bases: bob.visioner._visioner.CVDetector
A class that bridges the Visioner to bob so as to detect faces in
still images or video frames
Creates a new face localization object by loading object classification
and keypoint localization models from visioner model files.
Keyword Parameters:
- model
- file containing the model to be loaded; note: Serialization will use a native text format by default. Files that have their names suffixed with ‘.gz’ will be automatically decompressed. If the filename ends in ‘.vbin’ or ‘.vbgz’ the format used will be the native binary format.
- threshold
- object classification threshold
- scanning_levels
- scanning levels (the more, the faster)
- scale_variation
- scale variation in pixels
- clustering
- overlapping threshold for clustering detections
- method
- Scanning (default) or GroundTruth (note: this option does not work for
the time being)
-
__call__(image)[source]
Runs the detection machinery, returns all bounding boxes above
threshold. Detections are already clustered following the clustering
parameter. The iterable contains detections in descending order with the
first being the one with the highest score.
Keyword parameters:
- image
- A gray-scaled image (2D array) with dtype=uint8.
Returns an iterable with all detected bounding boxes in descending score
order (first one is has the highest score).
-
clustering
Overlapping threshold for clustering detections
-
detect((CVDetector)self, (object)image) → object :
Detects faces in the input (gray-scaled) image according to the current settings. The input image format should be a 2D array of dtype=uint8.
- C++ signature :
- boost::python::api::object detect(bob::visioner::CVDetector {lvalue},bob::python::const_ndarray)
-
detect_max((CVDetector)self, (object)image) → object :
Detects the most probable face in the input (gray-scaled) image according to the current settings
- C++ signature :
- boost::python::api::object detect_max(bob::visioner::CVDetector {lvalue},bob::python::const_ndarray)
-
method
Scanning or GroundTruth (default)
-
save((CVDetector)self, (str)filename) → None :
Saves the model and parameters to a given file.
Note: Serialization will use a native text format by default. Files that have their name suffixed with ‘.gz’ will be automatically decompressed. If the filename ends in ‘.vbin’ or ‘.vbgz’ the format used will be the native binary format.
- C++ signature :
- void save(bob::visioner::CVDetector {lvalue},std::string)
-
scale_variation
Scale variation in pixels
-
scanning_levels
Levels (the more, the faster)
-
threshold
Object classification threshold
-
class bob.visioner.Localizer(model_file=None, method=bob.visioner._visioner.LocalizationMethod.MultipleShots_Median, detector=None)[source]
Bases: bob.visioner._visioner.CVLocalizer
A class that bridges the Visioner to bob to localize keypoints in
still images or video frames
Creates a new face localization object by loading object classification
and keypoint localization models from visioner model files.
Keyword Parameters:
- model_file
- Path to a file containing the keypoint localization model. If None is
given, use the default localizer.
- method
- SingleShot, MultipleShots_Average or MultipleShots_Median (default)
- detector
- Path to a file or a CVDetector (or Max/Detector) object to be used as the
basis for the localization procedure. If None is given (the default), use
the default detector.
-
__call__(image)[source]
Runs the localization machinery, returns the bounding box and points
Keyword parameters:
- image
- A gray-scaled image (2D array) with dtype=uint8.
Returns a bounding box and a set of keypoints.
-
locate((CVLocalizer)self, (CVDetector)detector, (object)image) → object :
Runs the keypoint localization on the first (highest scored) face location determined by the detector. The input image format should be a 2D array of dtype=uint8.
- C++ signature :
- boost::python::api::object locate(bob::visioner::CVLocalizer {lvalue},bob::visioner::CVDetector {lvalue},bob::python::const_ndarray)
-
method
SingleShot, MultipleShots_Average or MultipleShots_Median (default)
-
save((CVLocalizer)self, (str)filename) → None :
Saves the model and parameters to a given file.
Note: Serialization will use a native text format by default. Files that have their name suffixed with ‘.gz’ will be automatically decompressed. If the filename ends in ‘.vbin’ or ‘.vbgz’ the format used will be the native binary format.
- C++ signature :
- void save(bob::visioner::CVLocalizer {lvalue},std::string)