The Visual Learning Group researches methods to learn models of the real-world from visual data, such as images, video, and motion sequences. Our most recent work leverages the framework of deep learning to address challenging problems at the boundary between computer vision and machine learning. Projects include image categorization, action recognition, depth estimation from single photo, as well as 3D reconstruction of human movement from monocular video.


  1. Three papers on video models to appear at CVPR 2018:

  2. A Closer Look at Spatiotemporal Convolutions for Action Recognition,
    with Du Tran, Heng Wang, Jamie Ray, Yann LeCun, and Manohar Paluri.

  3. Detect-and-Track: Efficient Pose Estimation in Videos,
    with Rohit Girdhar, Georgia Gkioxari, Manohar Paluri, and Du Tran.

  4. What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets,
    with De-An Huang, Vignesh Ramanathan, Dhruv Mahajan, Juan Carlos Niebles,
    Fei-Fei  Li, and Manohar Paluri.

  1. Together with collaborators, we are organizing two workshops at CVPR 2018:

  2. Brave New Ideas for Video Understanding

  3. DeepGlobe: A Challenge for Parsing the Earth through Satellite Images

    We invite paper submissions to both these workshops!

  1. New paper to appear at WACV 2018:

    BranchConnect: Large-Scale Visual Recognition with Learned Branch Connections,
    with Karim Ahmed.

  1. Two new papers on arXiv:

  2. Connectivity Learning in Multi-Branch Networks,
    with Karim Ahmed.

  3. VideoMCC: a New Benchmark for Video Comprehension,
    with Du Tran, Maksim Bolonkin, and Manohar Paluri.

  4. New paper at NIPS 2017:

    Learning to Inpaint for Image Compression,

    with Haris Baig, and Vladlen Koltun.

  1. New paper at CVPR 2017:

     Convolutional Random Walk Networks for Semantic Image Segmentation,
     with Gedas Bertasius, Stella Yu and Jianbo Shi.

  1. New paper at ICLR 2017:

     Recurrent Mixture Density Network for Spatiotemporal Visual Attention,
     with Loris Bazzani and Hugo Larochelle.

  1. New paper at AISTATS 2017:

    Local Perturb-and-MAP for Structured Prediction,
    with Gedas Bertasius, Qiang Liu and Jianbo Shi.

  1. New paper to appear in Computer Vision and Image Understanding:

    Multiple Hypothesis Colorization and Its Application to Image Compression,
    with Haris Baig.

  1. We released our new dataset for video comprehension, named VideoMCC. It includes
    over 600 hours of video. Give it a try!

  2. Together with collaborators at Facebook and Google we co-organized the
      1st Workshop on Large Scale Computer Vision Systems (LSCVS), at NIPS 2016.

  1. New paper at ECCV 2016:

     Network of Experts for Large-Scale Image Categorization,
     K. Ahmed, H. Baig, and L.Torresani.

  1. New paper at CVPR 2016:
    Semantic Segmentation with Boundary Neural Fields,
    G. Bertasius, J. Shi, and L. Torresani.

  1. New paper at the 3rd Workshop on Deep Learning in Computer Vision, 2016:

    Deep End2End Voxel2Voxel Prediction,
    D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri.

  1. New article in IJCV:

     EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis,
     D Tran, L. Torresani.


We thank the following sources for supporting our research.

Logo design by Christine Claudino