Abstract

We introduce PiCoDes: a very compact image descriptor which nevertheless allows high performance on object category recognition.  In particular, we address novel-category recognition: the task of defining indexing structures and image representations which enable a large collection of images to be searched for an object category that was not known when the index was built.

Instead, the training images defining the category are supplied at query time.   We explicitly learn descriptors of a given length (from as small as 16 bytes per image) which have good object-recognition performance.  In contrast to previous work in the domain of object recognition, we do not choose an arbitrary intermediate representation, but explicitly learn short codes.  In contrast to previous approaches to learn compact codes, we optimize explicitly for (an upper bound on) classification performance.   Optimization directly for binary features is difficult and non convex, but we present an alternation scheme and convex upper bound which demonstrate excellent performance in practice. 

PiCoDes of 256 bytes match the accuracy of the current best known classifier for the Caltech256 benchmark, but they decrease the database storage size by a factor of 100 and speed-up the training and testing of novel classes by orders of magnitude.

Paper

Alessandro Bergamo, Lorenzo Torresani, Andrew Fitzgibbon. PiCoDes: Learning a Compact Code for Novel-Category Recognition. Neural Information Processing Systems (NIPS), 2011  [pdf] [bib]

Results

Multiclass categorization accuracy on Caltech256 using different binary codes, as a function of the descriptor size. We use 10 training examples per class and 25 for testing.
The classification model is a 1-vs-all linear SVM for all methods.

The magenta line shows the performance obtained with classemes [27] trained with the same data  used to learn our PiCoDes.

PiCoDes outperform all the other compact codes. PiCoDes of 2048 bits match the accuracy of the state-of-the-art LP-beta classifier while enabling orders of magnitude faster training and testing.

Precision of object-class search on Caltech256 using codes of 2048 bits with linear SVMs:
for a varying number of training examples per class, we report the percentage of true positives in top 25 retrieved from a dataset containing 6375 distractors and 25 relevant results.

Finding pictures of an object class in the ILSVRC2010 dataset, which includes 150K images for 1000 different classes, using 2048-bit codes.

PiCoDes in combination with efficient linear SVM, enable accurate class retrieval from this large collection in less than a second.

Software

Software to extract PiCoDes is available here

References

[11] P. Gehler and S. Nowozin. On feature combination for multiclass object classification. In ICCV, 2009
[13] P.IndykandR.Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality.
In STOC ’98: Proceedings of the thirtieth annual ACM symposium on Theory of computing, New York, NY, USA, 1998. ACM Press.
[15] B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. In Advances in Neural Information Processing Systems (NIPS), 2009.
[19] L. Li, H. Su, E. Xing, and L. Fei-Fei. Object Bank: A high-level image representation for scene classification & semantic feature sparsification. In NIPS. 2010
[22] M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels. In Advances in Neural Information Processing Systems (NIPS), 2010
[24] R. Salakhutdinov and G. Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50:969–978, 2009
[27] L. Torresani, M. Szummer, and A. Fitzgibbon. Efficient object category recognition using classemes. InECCV, 2010
[31] Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS. 2009.

[32] Y. Gong and S. Lazebnik. Iterative Quantization: A Procrustean Approach to Learning Binary Codes. In CVPR 2011

PiCoDes visualization

The 128-bit PiCoDes is applied to the test data of ILSVRC2010. Six of the 128 bits are illustrated as follows: for bit c, all images are sorted by non-binarized classifier outputs acTx and the 10 smallest and largest are presented on each row. 

Note that ac is defined only up to sign, so the patterns to which the bits are specialized may appear in either the ``positive'' or ``negative'' columns.

Poster

*  PiCoDes stands for “Pictures Codes” but also “Pico-Descriptor”.

Acknowledgment

This material is based upon work supported by the National Science Foundation under CAREER award IIS-0952943. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).