The Visual Learning Group researches methods to learn models of the real-world from visual data, such as images, video, and motion sequences. Our most recent work leverages the framework of deep learning to address challenging problems at the boundary between computer vision and machine learning. Projects include image categorization, action recognition, depth estimation from single photo, as well as 3D reconstruction of human movement from monocular video.


  1. New paper to appear at WACV 2018:

    BranchConnect: Large-Scale Visual Recognition with Learned Branch Connections,
    with Karim Ahmed.

  1. Three new papers on arXiv:

  2. -A Closer Look at Spatiotemporal Convolutions for Action Recognition,
    with Du Tran, Heng Wang, Jamie Ray, Yann LeCun, and Manohar Paluri.

  3. -Connectivity Learning in Multi-Branch Networks,
    with Karim Ahmed.

  4. -VideoMCC: a New Benchmark for Video Comprehension,
    with Du Tran, Maksim Bolonkin, and Manohar Paluri.

  5. New paper at NIPS 2017:

    Learning to Inpaint for Image Compression,

    with Haris Baig, and Vladlen Koltun.

  1. New paper at CVPR 2017:

     Convolutional Random Walk Networks for Semantic Image Segmentation,
     with Gedas Bertasius, Stella Yu and Jianbo Shi.

  1. New paper at ICLR 2017:

     Recurrent Mixture Density Network for Spatiotemporal Visual Attention,
     with Loris Bazzani and Hugo Larochelle.

  1. New paper at AISTATS 2017:

    Local Perturb-and-MAP for Structured Prediction,
    with Gedas Bertasius, Qiang Liu and Jianbo Shi.

  1. New paper to appear in Computer Vision and Image Understanding:

    Multiple Hypothesis Colorization and Its Application to Image Compression,
    with Haris Baig.

  1. We released our new dataset for video comprehension, named VideoMCC. It includes
    over 600 hours of video. Give it a try!

  2. Together with collaborators at Facebook and Google we co-organized the
      1st Workshop on Large Scale Computer Vision Systems (LSCVS), at NIPS 2016.

  1. New paper at ECCV 2016:

     Network of Experts for Large-Scale Image Categorization,
     K. Ahmed, H. Baig, and L.Torresani.

  1. New paper at CVPR 2016:
    Semantic Segmentation with Boundary Neural Fields,
    G. Bertasius, J. Shi, and L. Torresani.

  1. New paper at the 3rd Workshop on Deep Learning in Computer Vision, 2016:

    Deep End2End Voxel2Voxel Prediction,
    D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri.

  1. New article in IJCV:

     EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis,
     D Tran, L. Torresani.


We thank the following sources for supporting our research.

Logo design by Christine Claudino