Python API¶
This section includes information for using the pure Python API of
bob.learn.libsvm
.

class
bob.learn.libsvm.
File
(path)¶ Bases:
object
Loads a given LIBSVM data file. The data file format, as defined on the library README is like this:
<label> <index1>:<value1> <index2>:<value2> ... <label> <index1>:<value1> <index2>:<value2> ... <label> <index1>:<value1> <index2>:<value2> ... ...
The labels are integer values, so are the indexes, starting from
1
(and not from zero as a Cprogrammer would expect). The values are floating point. Zero values are suppressed  LIBSVM uses a sparse format.Upon construction, objects of this class will inspect the input file so that the maximum sample size is computed. Once that job is performed, you can read the data in your own pace using the
read()
method.This class is made available to you so you can input original LIBSVM files and convert them to another better supported representation. You cannot, from this object, save data or extend the current set.

eof
() → bool¶ Returns
True
if the file has reached its end. To start reading from the file again, you must callreset()
before another read operation may succeed.

fail
() → bool¶ Returns
True
if the file has afail
condition orbad
bit sets. It means the read operation has found a critical condition and you can no longer proceed in reading from the file. Note this is not the same aseof()
which informs if the file has ended, but no errors were found during the read operations.

filename
¶ The name of the file being read

good
() → bool¶ Returns if the file is in a good state for readout. It is
True
if the current file it has neither theeof
,fail
orbad
bits set, whic means that nextread()
operation may succeed.

read
([values]) > (int, array)¶ Reads a single line from the file and returns a tuple containing the label and a numpy array of
float64
elements. Thenumpy.ndarray
has a shape as defined by theshape
attribute of the current file. If the file has finished, this method returnsNone
instead.If the output array
values
is provided, it must be a 64bit float array with a shape matching the file shape as defined byshape
. Providing an output array avoids constant memory reallocation.

read_all
([labels, [values]) > (array, array)¶ Reads all contents of the file into the output arrays
labels
(used for storing each entry’s label) andvalues
(used to store each entry’s features). The arraylabels
, if provided, must be a 1Dnumpy.ndarray
with data typeint64
, containing as many positions as entries in the file, as returned by the attributesamples
. The arrayvalues
, if provided, must be a 2D array with data typefloat64
, as many rows as entries in the file and as many columns as features in each entry, as defined by the attributeshape
.If the output arrays
labels
and/orvalues
are not provided, they will be allocated internally and returned.Note
This method is intended to be used for reading the whole contents of the input file. The file will be reset as by calling
reset()
before the readout starts.

samples
¶ The number of samples in the file

shape
¶ The size of each sample in the file, as tuple with a single entry


class
bob.learn.libsvm.
Machine
(path)¶ Bases:
object
Machine(hdf5file)
This class can load and run an SVM generated by libsvm. Libsvm is a simple, easytouse, and efficient software for SVM classification and regression. It solves CSVM classification, nuSVM classification, oneclassSVM, epsilonSVM regression, and nuSVM regression. It also provides an automatic model selection tool for CSVM classification. More information about libsvm can be found on its website. In particular, this class covers most of the functionality provided by the commandline utility svmpredict.
Input and output is always performed on 1D or 2D arrays with 64bit floating point numbers.
This machine can be initialized in two ways: the first is using an original SVM text file as produced by
libsvm
. The second option is to pass a preopened HDF5 file pointing to the machine information to be loaded in memory.Using the first constructor, we build a new machine from a libsvm model file. When you load using the libsvm model loader, note that the scaling parameters will be set to defaults (subtraction of 0.0 and division by 1.0). If you need scaling to be applied, set it individually using the appropriate methods on the returned object.
Using the second constructor, we build a new machine from an HDF5 file containing not only the machine support vectors, but also the scaling factors. Using this constructor assures a 100% state recovery from previous sessions.

coef0
¶ The coefficient 0 for
'POLY'
(polynomial) or'SIGMOIDAL'
(sigmoidal) kernels

degree
¶ The polinomial degree, only valid if the kernel is
'POLY'
(polynomial)

forward
(input[, output]) → array¶ o.predict_class(input, [output]) > array
o(input, [output]) > array
Calculates the predicted class using this Machine, given one single feature vector or multiple ones.
The
input
array can be either 1D or 2D 64bit float arrays. Theoutput
array, if provided, must be of typeint64
, always unidimensional. The output corresponds to the predicted classes for each of the input rows.Note
This method only accepts 64bit float arrays as input and 64bit integers as output.

gamma
¶ The \(\gamma\) parameter for
'POLY'
(polynomial),'RBF'
(gaussian) or'SIGMOID'
(sigmoidal) kernels

input_divide
¶ Input division factor, before feeding data through the weight matrix W. The division is applied just after subtraction  by default, it is set to 1.0.

input_subtract
¶ Input subtraction factor, before feeding data through the weight matrix W. The subtraction is the first applied operation in the processing chain  by default, it is set to 0.0.

kernel_type
¶ The type of kernel used by the support vectors in this machine

labels
¶ The class labels this machine will output

machine_type
¶ The type of SVM machine contained

n_support_vectors
¶ Will output the number of support vectors per class

predict_class
()¶ o.forward(input, [output]) > array
o.predict_class(input, [output]) > array
o(input, [output]) > array
Calculates the predicted class using this Machine, given one single feature vector or multiple ones.
The
input
array can be either 1D or 2D 64bit float arrays. Theoutput
array, if provided, must be of typeint64
, always unidimensional. The output corresponds to the predicted classes for each of the input rows.Note
This method only accepts 64bit float arrays as input and 64bit integers as output.

predict_class_and_probabilities
(input, [cls, [prob]]) > (array, array)¶ Calculates the predicted class and output probabilities for the SVM using the this Machine, given one single feature vector or multiple ones.
The
input
array can be either 1D or 2D 64bit float arrays. Thecls
array, if provided, must be of typeint64
, always unidimensional. Thecls
output corresponds to the predicted classes for each of the input rows. Theprob
array, if provided, must be of typefloat64
(likeinput
) and have as many rows asinput
andlen(o.labels)
columns, matching the number of classes for this SVM.This method always returns a tuple composed of the predicted classes for each row in the
input
array, with data typeint64
and of probabilities for each output of the SVM in a 1D or 2Dfloat64
array. If you don’t provide the arrays upon calling this method, we will allocate new ones internally and return them. If you are calling this method on a tight loop, it is recommended you pass thecls
andprob
arrays to avoid constant reallocation.

predict_class_and_scores
(input, [cls, [score]]) > (array, array)¶ Calculates the predicted class and output scores for the SVM using the this Machine, given one single feature vector or multiple ones.
The
input
array can be either 1D or 2D 64bit float arrays. Thecls
array, if provided, must be of typeint64
, always unidimensional. Thecls
output corresponds to the predicted classes for each of the input rows. Thescore
array, if provided, must be of typefloat64
(likeinput
) and have as many rows asinput
andC
columns, matching the number of combinations of the outputs 2by2. To score, LIBSVM will compare the SV outputs for each set two classes in the machine and output 1 score. If there is only 1 output, then the problem is binary and only 1 score is produced (C = 1
). If the SVM is multiclass, then the number of combinationsC
is the total amount of output combinations which is possible. IfN
is the number of classes in this SVM, then \(C = N\cdot(N1)/2\). IfN = 3
, thenC = 3
. IfN = 5
, thenC = 10
.This method always returns a tuple composed of the predicted classes for each row in the
input
array, with data typeint64
and of scores for each output of the SVM in a 1D or 2Dfloat64
array. If you don’t provide the arrays upon calling this method, we will allocate new ones internally and return them. If you are calling this method on a tight loop, it is recommended you pass thecls
andscore
arrays to avoid constant reallocation.

probability
¶ Set to
True
if this machine supports probability outputs

save
(path) → None¶ o.save(hdf5file) > None
Saves itself at a LIBSVM model file or into a
bob.io.base.HDF5File
. Saving the SVM into anbob.io.base.HDF5File
object, has the advantage of saving input normalization options together with the machine, which are automatically reloaded when you reinitialize it from the samebob.io.base.HDF5File
.

shape
¶ A tuple that represents the size of the input vector followed by the size of the output vector in the format
(input, output)
.


class
bob.learn.libsvm.
Trainer
([machine_type='C_SVC'[, kernel_type='RBF'[, cache_size=100[, stop_epsilon=1e3[, shrinking=True[, probability=False]]]]]]) → new Trainer¶ Bases:
object
This class emulates the behavior of the command line utility called
svmtrain
, from LIBSVM. It allows you to create a parameterized LIBSVM trainer to fullfil a variety of needs and configurations. The constructor includes parameters which are global to all machine and kernel types. Specific parameters for specific machines or kernel types can be finetuned using object attributes (see help documentation).Parameters:
 machine_type, str
The type of SVM to be trained. Valid options are:
'C_SVC'
(the default)'NU_SVC'
'ONE_CLASS'
'EPSILON_SVR'
(unsupported regression)'NU_SVR'
(unsupported regression)
 kernel_type, str
The type of kernel to deploy on this machine. Valid options are:
'LINEAR'
, for a linear kernel'POLY'
, for a polynomial kernel'RBF'
, for a radialbasis function kernel'SIGMOID'
, for a sigmoidal kernel'PRECOMPUTED'
, for a precomputed, user provided kernel please note this option is currently unsupported.
 cache_size, float
The size of LIBSVM’s internal cache, in megabytes
 stop_epsilon, float
The epsilon value for the training stopping criterion
 shrinking, bool
If set to
True
(the default), the applies LIBSVM’s shrinking heuristic. probability, bool
If set to
True
, then allows the outcoming machine produced by this trainer to output probabilities besides scores and class estimates. The default for this option isFalse
.
Note
These bindings do not support:
Precomputed Kernels
Regression Problems
Different weights for every label (wi option in svmtrain)
Fell free to implement those and remove these remarks.

cache_size
¶ Internal cache size to be used by LIBSVM (in megabytes)

coef0
¶ The coefficient 0 for
'POLY'
(polynomial) or'SIGMOID'
(sigmoidal) kernels

cost
¶ The cost value for
C_SVC
,EPSILON_SVR
orNU_SVR
. This parameter is normally referred only as \(C\) on literature. It should be a nonnegative floatingpoint number.

degree
¶ The polinomial degree, only used if the kernel is
'POLY'
(polynomial)

gamma
¶ The \(\gamma\) parameter for
'POLY'
(polynomial),'RBF'
(gaussian) or'SIGMOID'
(sigmoidal) kernels

kernel_type
¶ The type of kernel used by the support vectors in this machine

loss_epsilon_svr
¶ For
EPSILON_SVR
, this is the \(\epsilon\) value on the equation

machine_type
¶ The type of SVM machine that will be trained

nu
¶ The nu value for
NU_SVC
,ONE_CLASS
orNU_SVR
. This parameter should live in the range [0, 1].

probability
¶ If set to
True
, output Machines will support outputting probability estimates

shrinking
¶ If set to
True
, then use LIBSVM’s Shrinking Heuristics

stop_epsilon
¶ The epsilon used for stop training

train
(data[, subtract, divide]) → array¶ Trains a new machine for multiclass classification. If the number of classes in data is 2, then the assigned labels will be +1 and 1, in that order. If the number of classes is greater than 2, labels are picked starting from 1 (i.e., 1, 2, 3, 4, etc.). This convention follows what is done at the commandline for LIBSVM.
The input object
data
must be an iterable object (such as a Python list or tuple) containing 2D 64bit float arrays each representing data for one single class. The data in each array should be organized rowwise (i.e. 1 row represents 1 sample). All rows for all arrays should have exactly the same number of columns  this will be checked.Optionally, you may also provide both input arrays
subtract
anddivide
, which will be used to normalize the input data before it is fed into the training code. If provided, both arrays should be 1D and contain 64bit floats with the same width as all data in the input arraydata
. The normalization is applied in the following way:\[d' = \frac{d\text{subtract}}{\text{divide}}\]