Passive-Aggresive Model for Image Retrieval
PAMIR is a machine learning algorithm to learn a ranking function, i.e. a function which orders documents given a query. It has been primarily designed for multimodal retrieval, such as the retrieval of images from text queries. Its main advantages are scalability (it relies on online learning, which allows training from large datasets) and discriminative training (its training procedure optimizes a loss related to the final retrieval quality). Pamir is also a mountain range in Central Asia
, but that's a different story...
PAMIR is described in the following papers,
- A Discriminative Kernel-based Model to Rank Images from Text Queries,
IEEE Transactions on Pattern Analysis and Machine Intelligence (in press), 2008.
- A Discriminative Apporach for the Retrieval of Images from Text Queries,
D. Grangier, F. Monay and S. Bengio, European Conference on Machine Learning (ECML), 2006,
- Learning to Retrieve Images from Text Queries,
D. Grangier, F. Monay and S. Bengio, Workshop on Adaptive Multimedia Retrieval (AMR), 2006,
The source code of PAMIR is free, distributed under BSD license
. It is simple C++, built upon the Torch machine learning library. Hence, your first step to use it is to install Torch3, as instructed on the Torch3 website
. Then, you simply add the PAMIR package
to Torch, and that it ! The package comes with a README file that describes the class hierarchy. The two main example files trainImg2.cc and testImg2.cc can be compiled with the same methodology as the examples provided with Torch.
All data files should be provided as sparse matrices (see SparseMatrix.h), which are binary files containing
Note that the component of each row should be sorted by ascending indexes.
- the number of rows (int)
- the number of columns (int)
- for each matrix row,
- the number of non-zero component in the row (int)
- for each non-zero component,
- the component index (int)
- the component value (float)
Training and Testing
Two main files are provided as examples with the package, trainImg2 and testImg2.
trainImg2 can train a model, it takes as arguments
- train_query_f is a file describing the training query,
the dimension of this matrix is hence (number of queries) x (textual vocabulary size)
- train_image_f is the training file for pictures,
the dimension of this matrix is hence (number of pictures) x (visual vocabulary size)
- train_relevance_f is the relevance matrix,
this matrix contains only (0/1) values, its dimension is (number of queries) x (number of pictures)
- C is a hyper-parameter setting the trade-off between maximizing margin and minimizing errors
- n_iter is the number of training iterations
- model_file file to save the model
The following options can be provided to measure performance during training,
- valid_query_f is a file describing a second set of queries, for validation purposes
- valid_image_f is the validation file for pictures
- valid_relevance_f is the validation file for the relevance
- measure_file is a file containing various measurements on the validation set
- measure_freq sets the frequency (in # of iterations) of measures over the validation set
can test a model, it takes as arguments
- test_query_f is a file describing the test queries
- test_image_f is the test file for pictures
- test_relevance_f is the test file for the relevance
- measure_file is a file containing various measurements on the test set
- model_file model to load
We provide examples, comparing PAMIR to alternative solutions, such as Support Vector Machines (SVM) and Probabilistic Latent Semantic Analysis (PLSA) over the Corel dataset.
Details on these experiments can be found in [Grangier and Bengio, 2008], see above.
This work has been supported by the Swiss NSF through the MULTI project and by the Swiss OFES through the PASCAL European Network of Excellence. Part of this research has been performed while Samy Bengio was at the IDIAP Research Institute.