Ishan Misra

Director, Research Scientist @ GenAI (Meta)
I work on computer vision and machine learning research specifically in generative AI and self-supervised learning. I am a Director, Research Scientist in the GenAI group at Meta where I lead the research efforts on video generation models. I was the tech lead for Meta's Movie Gen project for foundation models in video generation, video editing, video personalization, and audio generation.
Previously, I was part of the FAIR team at Meta where I worked on self-supervised learning in computer vision and multimodal learning.
For my work in self-supervised learning, I was featured in the MIT Tech Review’s 35 innovators under 35 list (compiled globally across technological disciplines). You can hear me on Lex Fridman’s podcast for an overview of my work.
I got my PhD at Carnegie Mellon University. I received CMU's Recent Alumni Achievement Award in 2024 for my research contributions to computer vision and machine learning.

News

2024 October
Research on Movie Gen series of foundation media models announced (played role of Tech Lead for the full project). Covered in NY Times, Financial Times, Forbes.
2024 October
Giving four talks at ECCV 2024 Workshops and Tutorials on Generative Video Models
2024 September
Awarded Carnegie Mellon University's Recent Alumni Achievement Award
2024 July
Mark Zuckerberg announces the release of Llama3 (with our efforts on video recognition).
2024 July
Talk at ELLIS Workshop on Open Problems in Computer Vision & Generative Modelling at Munich, Germany
2024 July
Talk at Oxford Machine Learning Summer School at the University of Oxford, UK
2024 March
Talk at ELLIS Winter School on Foundation Models at Amsterdam, Netherlands
2024 June
4 papers accepted at CVPR
2024 June
Emu Video now powers "animate" on meta.ai that converts images to videos!
2024 June
Llama3 is released!
2023 Nov
Mark Zuckerberg announced our recent project Emu Video
2023 May
Mark Zuckerberg announced our recent foundational multimodal model ImageBind
2023 April
Mark Zuckerberg announced our recent foundational self-supervised model DINO-v2
2022 April
Keynote talk at the Ghost Day ML Conference
2021 March
Blog on self-supervised learning the dark matter of intelligence with Yann LeCun

Publications

Mainly publish on video and image recognition, video and image generation, object detection/segmentation, multimodal learning, and self-supervised learning.

Movie Gen: A Cast of Media Foundation Models
The Movie Gen Team (Overall Tech Lead; Core Contributor)
Meta Research 2024
PDF Blog
Generative AI Video Generation Foundation Models
The Llama 3 Herd of Models
The Llama3 Team (played role of a Core Contributor for video recognition)
arxiv 2024
PDF Code
Generative AI LLM Foundation Models Image Recognition Video Recognition
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Rohit Girdhar^* , Mannat Singh^* , Andrew Brown* , Quentin Duval* , Samaneh Azadi* , Sai Saketh Rambhatla, Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra*
ECCV 2024
PDF BibTeX Powers Meta's /animate and Emu Reels products *Authors contributed equally
Generative AI Diffusion Models Video Generation Foundation Models
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang, Bichen Wu, Jialiang Wang, Licheng Yu, Kunpeng Li, Yinan Zhao, Ishan Misra, Jia-Bin Huang, Peizhao Zhang, Peter Vajda, Diana Marculescu
CVPR 2024
PDF BibTeX Highlight
Generative AI Diffusion Models Video Generation
InstanceDiffusion: Instance-level Control for Image Generation
Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra
CVPR 2024
PDF Code BibTeX
Generative AI Diffusion Models Image Generation
Generating Illustrated Instructions
Sachit Menon, Ishan Misra, Rohit Girdhar
CVPR 2024
PDF BibTeX
Generative AI Diffusion Models LLM
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
Xudong Wang, Ishan Misra, Ziyun Zheng, Rohit Girdhar, Trevor Darrell
CVPR 2024
PDF BibTeX
Self-Supervised Learning Video Recognition Object Discovery
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Mannat Singh* , Quentin Duval* , Kalyan Vasudev Alwala* , Haoqi Fan, Vaibhav Aggarwal, Aaron Adcock, Armand Joulin, Piotr Dollár, Christoph Feichtenhofer, Ross Girshick, Rohit Girdhar, Ishan Misra
ICCV 2023
PDF Code Colab BibTeX *Authors contributed equally
Self-Supervised Learning Weakly-Supervised Learning Large Scale Foundation Models
MOST: Multiple Object localization with Self-supervised Transformers for object discovery.
Sai Saketh Rambhatla, Ishan Misra, Rama Chellappa, Abhinav Shrivastava
ICCV 2023
PDF Code BibTeX Oral
Self-Supervised Learning Object Discovery
MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses
Yang Fu, Ishan Misra, Xiaolong Wang
ICML 2023
PDF BibTeX
NeRF 3D generation
ImageBind: One Embedding Space To Bind Them All
Rohit Girdhar* , Alaaeldin El-Nouby* , Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra*
CVPR 2023
Demo PDF Code Demo BibTeX Highlighted paper *Authors contributed equally
Multimodal Learning Self-Supervised Learning Foundation Models
Cut and Learn for Unsupervised Object Detection and Instance Segmentation
Xudong Wang, Rohit Girdhar, Stella X. Yu, Ishan Misra
CVPR 2023
PDF Code BibTeX
Self-Supervised Learning Object Discovery
Learning Video Representations from Large Language Models
Yue Zhao, Ishan Misra, Philipp Krahenbuhl, Rohit Girdhar
CVPR 2023
PDF Code Colab BibTeX Highlighted paper
Video Recognition LLM Foundation Models
The Hidden Uniform Cluster Prior in Self-Supervised Learning
Mahmoud Assran, Randall Balestriero, Quentin Duval, Florian Bordes, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Nicolas Ballas
ICLR 2023
PDF BibTeX
Self-supervised Learning Representation Learning
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar* , Alaaeldin El-Nouby* , Mannat Singh* , Kalyan Vasudev Alwala* , Armand Joulin, Ishan Misra*
CVPR 2023
PDF Code BibTeX *Authors contributed equally
Self-supervised Learning Representation Learning Video Recognition
Masked Siamese Networks for Label-Efficient Learning
Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas
ECCV 2022
PDF Code BibTeX
Self-supervised Learning Representation Learning Image Recognition
Detecting Twenty-thousand Classes using Image-level Supervision
Xingyi Zhou, Rohit Girdhar, Armand Joulin, Phillip Krahenbuhl, Ishan Misra
ECCV 2022
PDF Code BibTeX
Object Detection Open World Recognition Instance Recognition
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Ishan Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski
Arxiv 2022
PDF
Self-supervised Learning Image Recognition Foundation Models
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar* , Mannat Singh* , Nikhila Ravi* , Laurens van der Maaten, Armand Joulin, Ishan Misra*
CVPR 2022
PDF Code BibTeX Oral *Authors contributed equally
Self-supervised Learning Image Recognition Video Recognition Multimodal Learning Foundation Models
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar
CVPR 2022
PDF Code BibTeX
Semantic Segmentation Panoptic Segmentation Instance Recognition
An End-to-End Transformer Model for 3D Object Detection
Ishan Misra, Rohit Girdhar, Armand Joulin
ICCV 2021
PDF Code BibTeX Oral
Object Detection 3D Recognition
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin
ICCV 2021
PDF Code
Self-supervised Learning Image Recognition Foundation Models
Self-Supervised Pretraining of 3D Features on any Point-Cloud
Zaiwei Zhang, Rohit Girdhar, Armand Joulin, Ishan Misra
ICCV 2021
PDF Code BibTeX
Self-supervised Learning Image Recognition Foundation Models Representation Learning
MDETR : Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath, Mannat Singh, Yann LeCun, Ishan Misra, Gabriel Synnaeve, Nicolas Carion
ICCV 2021
PDF Code Oral
Multimodal learning Instance Recognition Foundation Models
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Pedro Morgado, Nuno Vasconcelos, Ishan Misra
CVPR 2021
PDF Code BibTeX Best Paper Candidate
Multimodal learning Self-supervised Learning Audio Recognition
Robust Audio-Visual Instance Discrimination
Pedro Morgado, Ishan Misra, Nuno Vasconcelos
CVPR 2021
PDF BibTeX Oral
Multimodal learning Self-supervised Learning Audio Recognition
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
Jure Zbontar* , Li Jing* , Ishan Misra, Yann LeCun, Stéphane Deny
ICML 2021
PDF Code BibTeX *Authors contributed equally
Self-supervised Learning Image Recognition Representation Learning
3D Spatial Recognition without Spatially Labeled 3D
Zhongzheng Ren, Ishan Misra, Alexander G. Schwing, Rohit Girdhar
CVPR 2021
PDF
3D Recognition Instance Recognition
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
NeurIPS 2020
PDF Code BibTeX
Self-supervised Learning Image Recognition Representation Learning
Self-Supervised Learning of Pretext-Invariant Representations
Ishan Misra, Laurens van der Maaten
CVPR 2020
PDF Code BibTeX
Self-supervised Learning Image Recognition Representation Learning
ClusterFit: Improving Generalization of Visual Representations
Xueting Yan* , Ishan Misra* , Abhinav Gupta, Deepti Ghadiyaram* , Dhruv Mahajan*
CVPR 2020
PDF Code BibTeX *Authors contributed equally
Image Recognition Representation Learning
In Defense of Grid Features for Visual Question Answering
Huaizu Jiang, Ishan Misra, Marcus Rohrbach, Erik Learned-Miller, Xinlei Chen
CVPR 2020
PDF Code BibTeX
Image Recognition Multimodal Learning Visual Question Answering
3D-RelNet: Joint Object and Relational Network for 3D Prediction
Nilesh Kulkarni, Ishan Misra, Shubham Tulsiani, Abhinav Gupta
ICCV 2019
PDF Code BibTeX
3D Recognition Object detection Visual Question Answering
Scaling and Benchmarking Self-Supervised Visual Representation Learning
Priya Goyal, Dhruv Mahajan, Abhinav Gupta* , Ishan Misra*
ICCV 2019
PDF Code BibTeX *Authors contributed equally
Self-supervised Learning Image Recognition Representation Learning
Binary Image Selection (BISON): Interpretable Evaluation of Visual Grounding
Hexiang Hu, Ishan Misra, Laurens van der Maaten
ICCV Workshop on Vision and Language 2019
PDF Code BibTeX
Multimodal Learning
Does Object Recognition Work for Everyone?
Terrance DeVries* , Ishan Misra* , Changhan Wang* , Laurens van der Maaten
CVPR 2019
PDF BibTeX *Authors contributed equally
Fairness Image Recognition
Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing
Angela Jiang, Daniel L.-K. Wong, Christopher Canel, Ishan Misra, Michael Kaminsky, Michael Kozuch, Padmanabhan Pillai, David G. Andersen and Gregory Ganger
USENIX Annual Technical Conference 2018
PDF BibTeX
Video Recognition
Learning by Asking Questions
Ishan Misra, Ross Girshick, Rob Fergus, Martial Hebert, Abhinav Gupta, Laurens van der Maaten
CVPR 2018
PDF BibTeX Oral
Multimodal Learning Visual Question Answering
Cut Paste and Learn: Surprisingly Easy Synthesis
for Instance Detection
Debidatta Dwibedi, Ishan Misra, Martial Hebert
ICCV 2017
PDF Code BibTeX
Object detection Instance Recognition
From Red Wine to Red Tomato: Composition with Context
Ishan Misra, Abhinav Gupta, Martial Hebert
CVPR 2017
PDF Code BibTeX Oral
Zero-shot Recognition Compositional Learning
Shuffle and Learn: Unsupervised Learning
using Temporal Order Verification
Ishan Misra, C. Lawrence Zitnick, Martial Hebert
ECCV 2016
PDF Code BibTeX
Self-supervised Learning Representation Learning Video Recognition
Seeing through the Human Reporting Bias:
Visual Classifiers from Noisy
Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell, Ross Girshick
CVPR 2016
PDF Code BibTeX
Image Recognition
Cross-stitch Networks for Multi-Task Learning
Ishan Misra* , Abhinav Shrivastava* , Abhinav Gupta, Martial Hebert
CVPR 2016
PDF BibTeX Spotlight *Authors contributed equally
Multi-task Learning Image Recognition 3D Recognition
Generating Natural Questions About an Image
Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiaodong He, Lucy Vanderwende
ACL 2016
PDF Code BibTeX Oral Long Paper
Visual Question Generation
Visual Storytelling
Ting-Hao Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Aishwarya Agrawal, Ross Girshick, Xiaodong He, Pushmeet Kohli, et al.
NAACL 2016
PDF BibTeX
Storytelling
Watch and Learn: Semi-Supervised Learning of Object Detectors from Video
Ishan Misra, Abhinav Shrivastava, Martial Hebert
CVPR 2015
PDF BibTeX
Semi-supervised Learning Video Recognition Instance Recognition
Applying artificial vision models to human scene understanding
Elissa Aminoff, M. Toneva, Abhinav Shrivastava, Xinlei Chen, Ishan Misra, et al.
Journal of Frontiers in Computational Neuroscience 2015
Data-driven Exemplar Model Selection
Ishan Misra, Abhinav Shrivastava, Martial Hebert
WACV 2014
PDF BibTeX Best Student Paper
Image Recognition Instance Recognition