.. vim: set fileencoding=utf-8 : .. consolidated by Andre Anjos .. Wed 26 Mar 2014 10:38:10 CET .. .. Copyright (C) 2011-2014 Idiap Research Institute, Martigny, Switzerland .. testsetup:: import os import numpy import bob.learn.libsvm def F(m, f): from pkg_resources import resource_filename return resource_filename(m, os.path.join('data', f)) heart_model = F('bob.learn.libsvm', 'heart.svmmodel') svm = bob.learn.libsvm.Machine(heart_model) heart_data = F('bob.learn.libsvm', 'heart.svmdata') f = bob.learn.libsvm.File(heart_data) ====================================== Support Vector Machines and Trainers ====================================== A **Support vector machine** (SVM) [1]_ is a very popular `supervised` learning technique. |project| provides a bridge to `LIBSVM`_ which allows you to `train` such a `machine` and use it for classification. This section contains a tutorial on how to use |project|'s Pythonic bindings to LIBSVM. It starts by introducing the support vector :py:class:`bob.learn.libsvm.Machine` followed by the trainer usage. Machines -------- The functionality of this bridge includes loading and saving SVM data files and machine models, which you can produce or download following the instructions found on `LIBSVM`_'s home page. |project| bindings to `LIBSVM`_ **do not** allow you to explicitly set the machine's internal values. You must use the a trainer to create a machine first, as explained further down. Once you have followed those instructions, you can come back to this point and follow the remaining examples here. .. note:: Our current ``svm`` object was trained with the file called ``heart_scale``, distributed with `LIBSVM`_ and `available here `_. This dataset proposes a binary classification problem (i.e., 2 classes of features to be discriminated). The number of features is 13. Our extensions to `LIBSVM`_ also allows you to feed data through a :py:class:`bob.learn.libsvm.Machine` using :py:class:`numpy.ndarray` objects and collect results in that format. For the following lines, we assume you have available a :py:class:`bob.learn.libsvm.Machine` named ``svm``. (For this example, the variable ``svm`` was generated from the ``heart_scale`` dataset using the application ``svm-train`` with default parameters). The ``shape`` attribute, indicates how many features a machine from this module can input and how many it outputs (typically, just 1): .. doctest:: >>> svm.shape (13, 1) To run a single example through the SVM, just use the ``()`` operator: .. doctest:: >> svm(numpy.ones((13,), 'float64')) 1 >> svm(numpy.ones((10,13), 'float64')) (1, 1, 1, 1, 1, 1, 1, 1, 1, 1) Visit the documentation for :py:class:`bob.learn.libsvm.Machine` to find more information about these bindings and methods you can call on such a machine. Visit the documentation for :py:class:`bob.learn.libsvm.File` for information on loading `LIBSVM`_ data files directly into python and producing :py:class:`numpy.ndarray` objects. Below is a quick example: Suppose the variable ``f`` contains an object of type :py:class:`bob.learn.libsvm.File`. Then, you could read data (and labels) from the file like this: .. doctest:: :options: +NORMALIZE_WHITESPACE >>> labels, data = f.read_all() >>> data = numpy.vstack(data) #creates a single 2D array Then you can throw the data into the ``svm`` machine you trained earlier like this: .. doctest:: :options: +NORMALIZE_WHITESPACE >>> predicted_labels = svm(data) Training -------- The training set for SVM's consists of a list of 2D `NumPy` arrays, one for each class. The first dimension of each 2D `NumPy` array is the number of training samples for the given class and the second dimension is the dimensionality of the feature. For instance, let's consider the following training set for a two class problem: .. doctest:: :options: +NORMALIZE_WHITESPACE >>> pos = numpy.array([[1,-1,1], [0.5,-0.5,0.5], [0.75,-0.75,0.8]], 'float64') >>> neg = numpy.array([[-1,1,-0.75], [-0.25,0.5,-0.8]], 'float64') >>> data = [pos,neg] >>> print(data) # doctest: +SKIP .. note:: Please note that in the above training set, the data is pre-scaled so features remain in the range between -1 and +1. libsvm, apparently, suggests you do that for all features. Our bindings to libsvm do not include scaling. If you want to implement that generically, please do it. Then, an SVM [1]_ can be trained easily using the :py:class:`bob.learn.libsvm.Trainer` class. .. doctest:: :options: +NORMALIZE_WHITESPACE >>> trainer = bob.learn.libsvm.Trainer() >>> machine = trainer.train(data) #ordering only affects labels This returns a :py:class:`bob.learn.libsvm.Machine` which can later be used for classification, as explained before. .. doctest:: :options: +NORMALIZE_WHITESPACE >>> predicted_label = machine(numpy.array([1.,-1.,1.])) >>> print(predicted_label) [1] The `training` procedure allows setting several different options. For instance, the default `kernel` is an `RBF`. If we would like a `linear SVM` instead, this can be set before calling the :py:meth:`bob.learn.libsvm.Trainer.train` method. .. doctest:: :options: +NORMALIZE_WHITESPACE >>> trainer.kernel_type = 'LINEAR' One Class SVM ============= On the other hand, the package allows you to train a One Class Support Vector Machine. For training this kind of classifier take into account the following example. .. doctest:: :options: +NORMALIZE_WHITESPACE >>> oc_pos = 0.4 * numpy.random.randn(100, 2).astype(numpy.float64) >>> oc_data = [oc_pos] >>> print(oc_data) # doctest: +SKIP As the above example, an SVM [1]_ for one class problem can be trained easily using the :py:class:`bob.learn.libsvm.Trainer` class and selecting the appropiete machine_type (ONE_CLASS). .. doctest:: :options: +NORMALIZE_WHITESPACE >>> oc_trainer = bob.learn.libsvm.Trainer(machine_type='ONE_CLASS') >>> oc_machine = oc_trainer.train(oc_data) Then, as explained before, a :py:class:`bob.learn.libsvm.Machine` can be used for classify the new entries. .. doctest:: :options: +NORMALIZE_WHITESPACE >>> oc_test = 0.4 * numpy.random.randn(20, 2).astype(numpy.float64) >>> oc_outliers = numpy.random.uniform(low=-4, high=4, size=(20, 2)).astype(numpy.float64) >>> predicted_label_oc_test = oc_machine(oc_test) >>> predicted_label_oc_outliers = oc_machine(oc_outliers) >>> print(predicted_label_oc_test) # doctest: +SKIP >>> print(predicted_label_oc_outliers) # doctest: +SKIP Acknowledgements ---------------- As a final note, if you decide to use our `LIBSVM`_ bindings for your publication, be sure to also cite: .. code-block:: latex @article{CC01a, author = {Chang, Chih-Chung and Lin, Chih-Jen}, title = {{LIBSVM}: A library for support vector machines}, journal = {ACM Transactions on Intelligent Systems and Technology}, volume = {2}, issue = {3}, year = {2011}, pages = {27:1--27:27}, note = {Software available at \url{http://www.csie.ntu.edu.tw/~cjlin/libsvm}} } .. Place here your external references .. include:: links.rst .. [1] http://en.wikipedia.org/wiki/Support_vector_machine