Abstract

In this paper we introduce a novel image descriptor enabling accurate object categorization even with linear models. Akin to the popular attribute descriptors, our feature vector comprises the outputs of a set of classifiers evaluated on the image. However, unlike traditional attributes which represent hand-selected object classes and predefined visual properties, our features are learned automatically and correspond to ``abstract'' categories, which we name meta-classes. Each meta-class is a super-category obtained by grouping a set of object classes such that, collectively, they are easy to distinguish from other sets of categories. By using ``learnability'' of the meta-classes as criterion for feature generation, we obtain a set of attributes that encode general visual properties shared by multiple object classes and that are effective in describing and recognizing even novel categories, i.e., classes not present in the training set. We  demonstrate that simple linear SVMs trained on our meta-class descriptor significantly outperform the best known classifier on the Caltech256 benchmark. We also present results on the 2010 ImageNet Challenge database where our system produces results approaching those of the best systems, but at a much lower computational cost.

Paper

Alessandro Bergamo, Lorenzo Torresani. Meta-Class Features for Large-Scale Object Categorization on a Budget. Computer Vision and Pattern Recognition (CVPR), 2012   [pdf]  [bibtex]  [supplementary material]

Results on Caltech-256

Software

Software to extract the meta-class descriptor is available here

References

[4] A. Bergamo, L. Torresani, and A. Fitzgibbon. Picodes: Learning a compact code for novel-category recognition. NIPS 2011.

[12] P. Gehler and S. Nowozin. On feature combination for multiclass object classification. In ICCV, 2009.

[19] L. Li, H. Su, E. Xing, and L. Fei-Fei. Object Bank: A high-level image representation for scene classification & semantic feature sparsification. In NIPS, 2010

[20] Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour, K. Yu, L. Cao, and T. S. Huang. Large-scale image classification: Fast feature extraction and svm training. In CVPR, 2011.

[27] J. Sa ́nchez and F. Perronnin. High-dimensional signature compression for large-scale image classification. In CVPR, 2011

[29] L. Torresani, M. Szummer, and A. Fitzgibbon. Efficient ob- ject category recognition using classemes. In ECCV, 2010.

Acknowledgment

This material is based upon work supported by the National Science Foundation under CAREER award IIS-0952943. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Multiclass object categorization accuracy on Caltech256 using different descriptors, as a function of the number of training examples per class. We use 25 examples per class for testing. The classification model is a 1-vs-all linear SVM for all methods (except for LP-beta).

A simple linear SVM applied to our descriptors (mc and mc-bit), outperforms the state-of-the-art LP-beta classifier and is also orders of magnitude faster to both train and test.

Object-class search on ILSVRC2010: accuracy in retrieving images of a novel class from a dataset of 150,000 photos. For each query class, the true positives are only 0.1% of the database.

Our descriptor significantly outperforms classemes, which in [4] were also tested on such task.

Storage requirements for 10M images with different feature representations (note the log scale). Our descriptors outperforms the top-method for ILSVRC2010 in terms of scalability to large databases.

Object-class search time as a function of the number of images in the database, using a machine with 20 GB of memory.


Our approach is significantly faster than the competing approaches and is the only one that allows large databases to be kept in the memory of standard computers (see next figure).

Top-1 multiclass recognition accuracy on ILSVRC2010 (dataset used in the ImageNet challenge 2010) for different descriptors. From left to right:

  1. mc-bit-classemes is the part of our binarized descriptor that contains the features generated by the one-vs-the-rest classemes classifiers;

  2. mc-bit-tree is the part of our binarized descriptor that contains the features produced by the meta-class classifiers of the label tree;

  3. mc-bit: our full binarized descriptor;

  4. Psi(f)+LSH: LSH applied to the concatenation of the low-level features;

  5. mc+LSH: LSH applied to our real-valued descriptor.

Poster

Results on ILSVRC2010

Storage/Speed comparison