Ishan Misra

Research Scientist at FAIR
[Google Scholar] [Github] [CV]

I am a Research Scientist at Facebook AI Research (FAIR) where I work on Computer Vision and Machine Learning. My research interest is in reducing the need for supervision in visual learning. I finished my PhD at the Robotics Institute at Carnegie Mellon University where I worked with Martial Hebert and Abhinav Gupta. My PhD Thesis was titled “Visual Learning with Minimal Human Supervision” for which I received the SCS Distinguished Dissertation Award (Runner Up) 2018.


  • [2022] Keynote talk at the Ghost Day ML Conference, 2022
  • [2022] 2 papers accepted at CVPR 2022
  • [2022] Omnivore: a single model for image, video and 3D classification. Performs better than modality-specific models
  • [2022] 1 paper accepted at ICLR 2022
  • [2021] 1 paper accepted at NeurIPS 2021 (oral)
  • [2021] 6 papers accepted at ICCV 2021 (3 as oral)
  • [2021] Our CVPR 2021 paper (AVID) on Audio-Visual Self-supervised learning is a Best Paper Candidate.
  • [2021] 3 papers accepted at CVPR 2021, 1 paper at ICML 2021.
  • [2021] Co-wrote a blog on self-supervised learning with Yann LeCun [link].
  • [2021] SEER scales self-supervised learning to billions of images.
  • [2020] Our self-supervised technique called SwAV outperforms supervised pre-training on ALL considered transfer tasks and is the first method to do so.

Collaborators and Interns

  • Xingyi Zhou (University of Texas, Austin). Hosted at FAIR with Rohit Girdhar and Armand Joulin.
  • Bowen Cheng (University of Illinois, Urbana Champaign). Hosted at FAIR with Rohit Girdhar and Alex Kirillov
  • Zaiwei Zhang (University of Texas, Austin). Hosted at FAIR with Rohit Girdhar and Armand Joulin.
  • Zhongzheng (Jason) Ren (University of Illinois, Urbana Champaign). Hosted at FAIR with Rohit Girdhar.
  • Yuki Asano (University of Oxford). Hosted at FAIR with Armand Joulin, Piotr Bojanowski, and Andrea Vedaldi.
  • Pedro Morgado (University of California, San Diego).
  • Huaizu Jiang (University of Massachusetts, Amherst). Hosted at FAIR with Xinlei Chen and Marcus Rohrbach.
  • Jyh-Jing Hwang (University of California, Berkeley). Hosted at FAIR with Laurens van der Maaten.
  • Yan Wang (Cornell University). Hosted at FAIR with Laurens van der Maaten.
  • Terrance de Vries (University of Guelph). Hosted at FAIR with Laurens van der Maaten.


Masked Siamese Networks for Label-Efficient Learning
Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Ishan Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar*, Mannat Singh*, Nikhila Ravi*, Laurens van der Maaten, Armand Joulin, Ishan Misra*
CVPR 2022 (Oral)
Detecting Twenty-thousand Classes using Image-level Supervision
Xingyi Zhou, Rohit Girdhar, Armand Joulin, Phillip Krahenbuhl, Ishan Misra
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar
CVPR 2022
An End-to-End Transformer Model for 3D Object Detection
Ishan Misra, Rohit Girdhar, Armand Joulin
ICCV 2021 (Oral)
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin
ICCV 2021
Self-Supervised Pretraining of 3D Features on any Point-Cloud
Zaiwei Zhang, Rohit Girdhar, Armand Joulin, Ishan Misra
ICCV 2021
MDETR : Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath, Mannat Singh, Yann LeCun, Ishan Misra, Gabriel Synnaeve, Nicolas Carion
ICCV 2021 (Oral)
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Pedro Morgado, Nuno Vasconcelos, Ishan Misra
CVPR 2021 (Best Paper Candidate)
Robust Audio-Visual Instance Discrimination
Pedro Morgado, Ishan Misra, Nuno Vasconcelos
CVPR 2021 (Oral)
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
Jure Zbontar*, Li Jing*, Ishan Misra, Yann LeCun, Stéphane Deny
ICML 2021
3D Spatial Recognition without Spatially Labeled 3D
Zhongzheng Ren, Ishan Misra, Alexander G. Schwing, Rohit Girdhar
CVPR 2021
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
NeurIPS 2020
Self-Supervised Learning of Pretext-Invariant Representations
Ishan Misra, Laurens van der Maaten
CVPR 2020
ClusterFit: Improving Generalization of Visual Representations
Xueting Yan*, Ishan Misra*, Abhinav Gupta, Deepti Ghadiyaram**, Dhruv Mahajan**
CVPR 2020
In Defense of Grid Features for Visual Question Answering
Huaizu Jiang, Ishan Misra, Marcus Rohrbach, Erik Learned-Miller, Xinlei Chen
CVPR 2020
3D-RelNet: Joint Object and Relational Network for 3D Prediction
Nilesh Kulkarni, Ishan Misra, Shubham Tulsiani, Abhinav Gupta
ICCV 2019
Scaling and Benchmarking Self-Supervised Visual Representation Learning
Priya Goyal, Dhruv Mahajan, Abhinav Gupta*, Ishan Misra*
ICCV 2019
Binary Image Selection (BISON): Interpretable Evaluation of Visual Grounding
Hexiang Hu, Ishan Misra, Laurens van der Maaten
ICCV Workshop on Vision and Language, 2019
Does Object Recognition Work for Everyone?
Terrance DeVries*, Ishan Misra*, Changhan Wang*, Laurens van der Maaten
CVPR 2019 Workshop on Computer Vision for Global Challenges (CV4GC)
Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing
Angela Jiang, Daniel L.-K. Wong, Christopher Canel, Ishan Misra, Michael Kaminsky, Michael Kozuch, Padmanabhan Pillai, David G. Andersen and Gregory Ganger
USENIX Annual Technical Conference 2018
Learning by Asking Questions
Ishan Misra, Ross Girshick, Rob Fergus, Martial Hebert, Abhinav Gupta and Laurens van der Maaten
CVPR 2018 (Oral)
Cut Paste and Learn: Surprisingly Easy Synthesis
for Instance Detection
Debidatta Dwibedi, Ishan Misra and Martial Hebert
ICCV 2017
From Red Wine to Red Tomato: Composition with Context
Ishan Misra, Abhinav Gupta and Martial Hebert
CVPR 2017 (Oral) [Acceptance Rate 2.65%]
Shuffle and Learn: Unsupervised Learning
using Temporal Order Verification
Ishan Misra, C. Lawrence Zitnick and Martial Hebert
ECCV 2016
Seeing through the Human Reporting Bias:
Visual Classifiers from Noisy Human-Centric Labels
Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell and Ross Girshick
CVPR 2016
Cross-stitch Networks for Multi-Task Learning
Ishan Misra, Abhinav Shrivastava, Abhinav Gupta and Martial Hebert
CVPR 2016 (Spotlight) [Acceptance Rate 9.7%]
Generating Natural Questions About an Image
Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin,
Margaret Mitchell, Xiaodong He and Lucy Vanderwende
ACL 2016 (Oral) (Long Paper)
Visual Storytelling
Ting-Hao Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Aishwarya Agrawal,
Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra,
C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley and Margaret Mitchell.
NAACL 2016
Watch and Learn: Semi-Supervised Learning of Object Detectors from Video
Ishan Misra, Abhinav Shrivastava and Martial Hebert
CVPR 2015
Applying artificial vision models to human scene understanding
E. Aminoff, M. Toneva, A. Shrivastava, X. Chen, I. Misra, A. Gupta, M. Tarr.
Journal of Frontiers in Computational Neuroscience, 2015
Data-driven Exemplar Model Selection
Ishan Misra, Abhinav Shrivastava and Martial Hebert
WACV 2014 (Oral) (Best Student Paper)
Hybrid Implementation of Error Diffusion Dithering
Aditya Deshpande, Ishan Misra and P J Narayanan
High Performance Computing (HiPC) 2011


Optimizing multi-class multimedia data classification using negative data
Xian-Sheng Hua, Jin Li and Ishan Misra
Optimizing multi-class image classification using patch features
Ishan Misra, Jin Li and Xian-Sheng Hua