VLG extractor

Overview

This software extracts the image descriptors described in:

  1. Efficient Object Category Recognition Using Classemes, Lorenzo Torresani, Martin Szummer and
    Andrew Fitzgibbon, ECCV 2010.

    The software supports the real-valued version (11 Kbytes/image) as well as the more compact
    binary version (333 bytes/image).

  2. PiCoDes: Learning a Compact Code for Novel-Category Recognition, Alessandro Bergamo, Lorenzo Torresani, Andrew Fitzgibbon, NIPS 2011
    The software supports the 128bit, 1024bit and 2048bit versions.

  3. Meta-Class Features for Large-Scale Object Categorization on a Budget”,
    Alessandro Bergamo, Lorenzo Torresani, CVPR 2012.
    The software supports the mc and mc-bit versions.


The software supports several images types (Jpeg, Png, Tiff, and others) and it is available for Microsoft Windows, GNU/Linux and Mac OSX.

We exploit explicit features maps approximating the intersection kernel [2] to efficiently evaluate the non-linear kernels used by classemes, resulting in a descriptor extraction that takes about 2 sec per image.


If you use this code for a publication, we ask that you cite any of the three above papers, depending on which descriptor you use.

Software

Checkout the source code on GitHub!

Acknowledgment

This material is based upon work supported by the National Science Foundation under CAREER award IIS-0952943. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).


For any information, bug report and suggestion, please contact Alessandro Bergamo, aleb@cs.dartmouth.edu

References

[1] Lorenzo Torresani, Martin Szummer and Andrew W. Fitzgibbon, "Efficient Object Category Recognition Using

     Classemes", ECCV 2010

[2] Andrea Vedaldi and Andrew Zisserman, “Efficient additive kernels via explicit feature maps”,  CVPR 2010

[3] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun,  “Support Vector Learning for Interdependent and

     Structured Output Spaces,” ICML 2004

[4] Chih-Chung Chang and Chih-Jen Lin, “LIBSVM : a library for support vector machines”, ACM Transactions on

     Intelligent Systems and Technology, 2:27:1--27:27, 2011

[5] Li-Jia Li, Hao Su, Eric P. Xing and Li Fei-Fei, “Object Bank: A High-Level Image Representation for Scene

     Classification and Semantic Feature Sparsification”, NIPS 2010

[6] Peter V. Gehler and Sebastian Nowozin, “On feature combination for multiclass object classification”, ICCV 2009

[7] Alessandro Bergamo, Lorenzo Torresani, Andrew Fitzgibbon, PiCoDes: Learning a Compact Code for Novel
     Category Recognition
, NIPS 2011

[8] P. Indykand, R. Motwani. “Approximate nearest neighbors: towards removing the curse of dimensionality.”
    
In STOC ’98: Proceedings of the thirtieth annual ACM symposium on Theory of computing,

     New York, NY, USA, 1998. ACM Press.

[9] Y. Weiss, A. Torralba, and R. Fergus. “Spectral hashing.”, NIPS 2009

[10] Y. Gong and S. Lazebnik. “Iterative Quantization: A Procrustean Approach to Learning Binary Codes”, CVPR 2011

[11] Alessandro Bergamo and Lorenzo Torresani. “Meta-Class Features for Large-Scale Object Categorization
       on a Budget”, CVPR 2012

[12] A. Berg, J. Deng, and L. Fei-Fei. Large Scale Visual recognition challenge, 2010.
       http://www.image-net.org/challenges/LSVRC/2010/


Results on Caltech 256

The following plot shows  the multiclass categorization accuracy on Caltech256 using different binary codes, as a function of the descriptor size. We use 10 training examples per class and 25 for testing.
The classification model is a 1-vs-all linear SVM [4] for all methods, with the exception of LP-beta.

PiCoDes of 2048 bits match the accuracy of the state-of-the-art LP-beta classifier while enabling orders of magnitude faster training and testing.

Results on ILSVRC2010

Top-1 multiclass recognition accuracy on the large-scale ILSVRC2010 dataset [12] for different descriptors. The classification model is a fast 1-vs-all linear SVM [4] for all descriptors. Despite the large amount of training data (1000 classes, 1.2 million of examples), thanks to the compactness of our descriptors, the storage requirements are very low. In particular, for classemes_binary,  PiCoDes, and mc-bit the entire database needs at most 2 GB of memory, allowing the training on low-budget computers.